Best way to intercept terminal escape sequences?

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Best way to intercept terminal escape sequences?
@ 2010-08-26 15:58 Ryan Johnson
  2010-08-26 23:03 ` Stefan Monnier
  0 siblings, 1 reply; 21+ messages in thread
From: Ryan Johnson @ 2010-08-26 15:58 UTC (permalink / raw)
  To: emacs-devel

  Hi all,

Terminal escape sequences cause emacs a lot of pain, with problems like
bug #6758 and the contortions performed by xt-mouse.el being just two
examples.

I've been wrestling with xt-mouse.el for the last week and it seems to
me that many or all of these problems arise because escape sequences are
treated like any other string of characters when in fact they are nearly
all encoded events of some sort. For example, the xterm escape sequence
"ESC O D" is eventually converted to <right>, but anybody calling
read-char or read-event will get a string of characters instead (and
probably wish they hadn't). This is why xt-mouse can't do full mouse
handling -- mouse-drag-track (mouse.el) calls read-event to watch for
the mouse-up event. If xt-mouse simply translated mouse escapes into
events, mouse-drag-track would intercept the mouse-up sequence and think
it was is garbage, with xt-mouse never having a chance to decode it. So,
instead, xt-mouse reinvents the wheel (poorly) and has to jump through
hoops to send [down-mouse-1 move-mouse-1 mouse-1] all at once to
simulate a mouse drag after it has finished.

Problems like this could be solved by allowing the system to process
terminal escape sequences early in the food chain (= before read-char
and read-event). The idea would be to let a terminal translator
interpose on the keyboard input before anything else -- even coding
systems -- and filter known escape sequences. The interposition would
have two key features:

1. Recognized key sequences can be absorbed completely or (more likely)
converted to events which appear at read-event in the proper order -- (a
ESC O D b) would make three read-event calls return (a <right> b).
2. Unrecognized sequences are passed through, unchanged, for normal
handling by the rest of the input processing chain -- (ESC x x) would be
ignored

With this interposition in place, terminal events could become first
class citizens. Function key presses would appear to *everyone* as such,
mouse events would be processed by the normal mouse-handling code, etc.

The input-decode-map introduced by v23 is a nice start (it can absorb
the sequence or generate events as needed), but it doesn't pass through
unrecognized sequences and read-char/read-event bypass it.

Unfortunately read-* are written in C and can't be advised effectively
(I tried but the warnings were accurate). Input methods don't seem to
work, even if I knew how to create one (they receive printable chars
only and the read-* have to ask for them). I also tried creating a
keyboard coding system to intercept terminal escape sequences as a
proof-of-concept, but was thwarted by bug #6920 (and that's not the
right place for it anyway).

Alternatively, I toyed with the idea of making mouse.el install an
overriding-terminal-local-map (to silently ignore everything except ^G
and mouse events) instead of calling read-event, but that would require
significant rewriting and isn't a general fix for the escape sequence
problem. I tried source diving for more ideas, but read_filtered_event
(lread.c) calls read_char (keyboard.c), which is 900 lines of Greek to me.

Thoughts? What would be the least intrusive way to support escape
sequences better? Or is there a better way mouse.el should process
events so the existing input-decode-map becomes effective?

(please CC me in all replies -- I'm signed up for daily digests)

Thanks,
Ryan

P.S. Apologies if this double-posts -- my mailer was messed up.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-08-26 15:58 Best way to intercept terminal escape sequences? Ryan Johnson
@ 2010-08-26 23:03 ` Stefan Monnier
  2010-08-27  9:28   ` Ryan Johnson
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Monnier @ 2010-08-26 23:03 UTC (permalink / raw)
  To: Ryan Johnson; +Cc: emacs-devel

> all encoded events of some sort. For example, the xterm escape sequence
> "ESC O D" is eventually converted to <right>, but anybody calling
> read-char or read-event will get a string of characters instead (and
> probably wish they hadn't).

Yes, that's why they should use `read-key' instead.

> This is why xt-mouse can't do full mouse handling -- mouse-drag-track
> (mouse.el) calls read-event to watch for the mouse-up event.

And indeed, mouse-drag-track should also use read-key, IIUC.
But read-key is brand new, so a lot of code needs to be adapted to
use it.  Patches welcome.

> Problems like this could be solved by allowing the system to process
> terminal escape sequences early in the food chain (= before read-char
> and read-event).

Packages like, ahem, xt-mouse.el need lower-level access, so we have to
keep lower-level primitives.


        Stefan




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-08-26 23:03 ` Stefan Monnier
@ 2010-08-27  9:28   ` Ryan Johnson
  2010-08-27 10:36     ` David Kastrup
  0 siblings, 1 reply; 21+ messages in thread
From: Ryan Johnson @ 2010-08-27  9:28 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2587 bytes --]

  On 8/27/2010 1:03 AM, Stefan Monnier wrote:
>> all encoded events of some sort. For example, the xterm escape sequence
>> "ESC O D" is eventually converted to<right>, but anybody calling
>> read-char or read-event will get a string of characters instead (and
>> probably wish they hadn't).
> Yes, that's why they should use `read-key' instead.
!!

I didn't know about that function. I tried s/read-event/read-char/ in 
mouse.el, and nothing obvious broke. However, I'll need to modify 
xt-mouse.el to take advantage of the change -- it still doesn't trust 
mouse.el.

Meanwhile, though, I'm hitting another problem which still seems to 
require a lower-than-low-level equivalent for read-char which can bypass 
coding systems.

For reasons I don't understand, xterm's mouse escape sequence is 
completely non-standard: "ESC [ M pb px py". The p* are bytes taking 
values between 33 and 255, and there is no terminator byte.***

The result of this unfortunate design decision is that px and py can 
throw a huge wrench in emacs' utf-8 decoding, because a (px py) pair can 
look like all kinds of utf-8 (valid or otherwise). As long as px < 0xe0 
*and* py< 0xC0 I can reliably decompose the unicode char I'm given into 
the original utf-8 sequence (possibly in the illegal C0 or C1 range) and 
py follows immediately to satisfy the decoder. The rest of the time, 
though, the lack of a sequence terminator leaves me at the decoder's 
mercy to decide how to interpret the bytes, and it's at the mercy of 
whatever input follows the mouse sequence.

I tried setting the keyboard-coding-system to iso-latin-1, but no luck.

For down-mouse and move-mouse, things works out all right. A mouse-up or 
mouse-move always follows and the leading ESC makes emacs give up and 
hand over the raw bytes. The up-mouse sequence, though, doesn't usually 
have anything after it, so emacs waits patiently for more input. 
Clicking at position (184 . 136), for example, leaves the minibuffer 
prompt: "ESC [ M SPC \300\270 \300\210 ESC [ M #-". Fortunately, ^G 
works as expected, so it's merely annoying. This is a huge improvement 
over the current practice of dumping a bunch of garbage bytes into the 
user's buffer, so I'm attaching a preliminary patch (against the 
emacs-23.2 release, btw).

Thoughts?
Ryan

*** It *really* should have been something like "ESC [ M pb ; px ; py m" 
with the p* being string representations of integers. I'm discussing 
possible fixes with the xterm maintainer, but it could be a while (and 
would require creating a new mouse mode to avoid breaking existing apps).

[-- Attachment #2: xt-mouse.el.utf-8.patch --]
[-- Type: text/plain, Size: 2464 bytes --]

--- xt-mouse.new.el	2010-08-27 09:22:47.640625000 +0200
+++ xt-mouse.utf-8.el	2010-08-27 11:02:20.031250000 +0200
@@ -120,16 +120,36 @@
   pos)

 ;; read xterm sequences above ascii 127 (#x7f)
-(defun xterm-mouse-event-read ()
-  (let ((c (read-char)))
+(defun xterm-mouse-pos-read ()
+  "Read positions from an xterm mouse escape sequence.
+
+This job is complicated because xterm can emit (px py) pairs
+which look like (possibly invalid) UTF-8 sequences which emacs
+dutifully decodes into unicode characters.
+
+UTF-8 sequencing can occur any time px and py are both greater
+than #x80.  For terminals smaller than 128x128, this function can
+correct the problem because only px can trigger the
+confusion. Terminals taller than 128 lines pose a more difficult
+problem because the py can look like the start of a 2-byte UTF-8
+sequence, and xterm sends no sequence terminator which we could
+use to detect the 'EOF'. This tends to leave emacs waiting for
+input after mouse-up."
+  (let* ((c (read-char))
+	 (b (multibyte-char-to-unibyte c)))
     (cond
      ;; mouse clicks outside the encodable range produce 0
-     ((= c 0) #x100)
-     ;; 8-bit control characters which don't pair up with the next
-     ;; char come back as "pseudo-negative", e.g. #x3fff??
-     ((> c #x7ff) (logand c #xff))
+     ((eq b 0) (cons #x100 (multibyte-char-to-unibyte (read-char))))
+     ;; emacs treats some combinations of px py like utf-8
+     ((or
+       ;; normal ("C2") sequences don't convert back to a single byte
+       (eq b -1)
+       ;; illegal ("C0") sequences convert back to one (legal) char
+       (not (eq c (unibyte-char-to-multibyte b))))
+      ;; pick the char apart into two bytes
+      (cons (+ #xc0 (lsh c -6)) (+ #x80 (logand c #x3f))))
      ;; normal case
-     (c))))
+     ((cons b (multibyte-char-to-unibyte (read-char)))))))

 (defun xterm-mouse-truncate-wrap (f)
   "Truncate with wrap-around."
@@ -148,9 +168,10 @@

 (defun xterm-mouse-event ()
   "Convert XTerm mouse event to Emacs mouse event."
-  (let* ((type (- (xterm-mouse-event-read) #o40))
-	 (x (- (xterm-mouse-event-read) #o40 1))
-	 (y (- (xterm-mouse-event-read) #o40 1))
+  (let* ((type (- (read-char) #o40))
+	 (pos (xterm-mouse-pos-read))
+	 (x (- (car pos) #o40 1))
+	 (y (- (cdr pos) #o40 1))
 	 ;; Emulate timestamp information.  This is accurate enough
 	 ;; for default value of mouse-1-click-follows-link (450msec).
 	 (timestamp (xterm-mouse-truncate-wrap

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-08-27  9:28   ` Ryan Johnson
@ 2010-08-27 10:36     ` David Kastrup
  2010-08-27 23:50       ` Stefan Monnier
  0 siblings, 1 reply; 21+ messages in thread
From: David Kastrup @ 2010-08-27 10:36 UTC (permalink / raw)
  To: emacs-devel

Ryan Johnson <ryanjohn@ece.cmu.edu> writes:

>  On 8/27/2010 1:03 AM, Stefan Monnier wrote:
>>> all encoded events of some sort. For example, the xterm escape sequence
>>> "ESC O D" is eventually converted to<right>, but anybody calling
>>> read-char or read-event will get a string of characters instead (and
>>> probably wish they hadn't).
>> Yes, that's why they should use `read-key' instead.
> !!
>
> I didn't know about that function. I tried s/read-event/read-char/ in
> mouse.el, and nothing obvious broke. However, I'll need to modify
> xt-mouse.el to take advantage of the change -- it still doesn't trust
> mouse.el.
>
> Meanwhile, though, I'm hitting another problem which still seems to
> require a lower-than-low-level equivalent for read-char which can
> bypass coding systems.
>
> For reasons I don't understand, xterm's mouse escape sequence is
> completely non-standard: "ESC [ M pb px py". The p* are bytes taking
> values between 33 and 255, and there is no terminator byte.***
>
> The result of this unfortunate design decision is that px and py can
> throw a huge wrench in emacs' utf-8 decoding, because a (px py) pair
> can look like all kinds of utf-8 (valid or otherwise). As long as px <
> 0xe0 *and* py< 0xC0 I can reliably decompose the unicode char I'm
> given into the original utf-8 sequence (possibly in the illegal C0 or
> C1 range) and py follows immediately to satisfy the decoder. The rest
> of the time, though, the lack of a sequence terminator leaves me at
> the decoder's mercy to decide how to interpret the bytes, and it's at
> the mercy of whatever input follows the mouse sequence.
>
> I tried setting the keyboard-coding-system to iso-latin-1, but no
> luck.

You should set it to raw-text, do your mouse code preprocessing, and
afterwards decode the remainder using the intended coding system.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-08-27 10:36     ` David Kastrup
@ 2010-08-27 23:50       ` Stefan Monnier
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Monnier @ 2010-08-27 23:50 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

> You should set it to raw-text, do your mouse code preprocessing, and
> afterwards decode the remainder using the intended coding system.

I'm not sure when this decoding takes place nowadays (it used to be
performed in key-translation-map, which was too late), but if it's done
very early as I suspect, changing keyboard-coding-system might not have
any effect because the translation might have taken place already.

        Stefan

^ permalink raw reply	[flat|nested] 21+ messages in thread

[parent not found: <20100827142724.E1DD712F@hazard.ece.cmu.edu>]

* Re: Best way to intercept terminal escape sequences?
       [not found] <20100827142724.E1DD712F@hazard.ece.cmu.edu>
@ 2010-08-27 14:44 ` Ryan Johnson
  2010-08-27 15:40   ` Eli Zaretskii
  0 siblings, 1 reply; 21+ messages in thread
From: Ryan Johnson @ 2010-08-27 14:44 UTC (permalink / raw)
  To: emacs-devel

  On Fri, 27 Aug 2010 16:17:14 +0200, David Kastrup wrote:
>>   On Fri, 27 Aug 2010 12:36:52 +0200, David Kastrup wrote:
>>> Ryan Johnson<ryanjohn@ece.cmu.edu>   writes
>>>> I tried setting the keyboard-coding-system to iso-latin-1, but no
>>>> luck.
>>> You should set it to raw-text, do your mouse code preprocessing, and
>>> afterwards decode the remainder using the intended coding system.
>> Like this?
>> (defun xterm-mouse-pos-read ()
>>    (let ((old-coding (keyboard-coding-system)))
>>      (set-keyboard-coding-system 'raw-text)
>>      (unwind-protect
>>          (cons (xterm-mouse-event-read) (xterm-mouse-event-read))
>>        (set-keyboard-coding-system old-coding))))
> You can't go setting the keyboard reading system back and forth.  It
> operates into a buffer even before calling read-char.  You have to put
> it to raw and stick with it.
Makes sense. That's why I was hoping there was a way intercept input 
before coding systems get their claws on it. Otherwise xt-mouse finds 
itself worrying about what coding system the user has requested, whether 
it changed recently, etc.

If coding systems stacked (and if user-defined ones worked properly) it 
might be easier, but AFAIK neither is true.

Is there really no other way to do this?

Ryan



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-08-27 14:44 ` Ryan Johnson
@ 2010-08-27 15:40   ` Eli Zaretskii
  2010-08-27 18:04     ` Ryan Johnson
  2010-08-27 23:54     ` Stefan Monnier
  0 siblings, 2 replies; 21+ messages in thread
From: Eli Zaretskii @ 2010-08-27 15:40 UTC (permalink / raw)
  To: Ryan Johnson; +Cc: emacs-devel

> Date: Fri, 27 Aug 2010 16:44:12 +0200
> From: Ryan Johnson <ryanjohn@ece.cmu.edu>
> 
> Is there really no other way to do this?

You could use the :post-read-conversion attribute of a coding-system.
That is, define a new coding-system that has this attribute specifying
a function you will write.  That function will first decode the mouse
stuff, and then decode the rest by the terminal-coding-system set by
the user.

You can see an example of this in ctext-with-extensions.  It is
defined on mule-conf.el and its post-read-conversion function is
defined on mule.el.

Other than that, it's no surprise that this is not easy: we ask Emacs
to read keyboard input that is encoded in two different encodings,
which is not how keyboard input was designed.

I think the only way that is easier and cleaner would be if Emacs
could read the mouse input from a separate file descriptor.  We could
then set that file descriptor to use a different encoding.  Of course,
that would need low-level changes in Emacs, even if it is possible in
xterm.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-08-27 15:40   ` Eli Zaretskii
@ 2010-08-27 18:04     ` Ryan Johnson
  2010-08-27 20:38       ` Eli Zaretskii
  2010-08-27 23:54     ` Stefan Monnier
  1 sibling, 1 reply; 21+ messages in thread
From: Ryan Johnson @ 2010-08-27 18:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

  On 8/27/2010 5:40 PM, Eli Zaretskii wrote:
>> Date: Fri, 27 Aug 2010 16:44:12 +0200
>> From: Ryan Johnson<ryanjohn@ece.cmu.edu>
>>
>> Is there really no other way to do this?
> You could use the :post-read-conversion attribute of a coding-system.
> That is, define a new coding-system that has this attribute specifying
> a function you will write.  That function will first decode the mouse
> stuff, and then decode the rest by the terminal-coding-system set by
> the user.
> You can see an example of this in ctext-with-extensions.  It is
> defined on mule-conf.el and its post-read-conversion function is
> defined on mule.el.
That's actually what I tried first, but as I mentioned in the OP, emacs 
doesn't always deliver characters to :post-read-conversion in the 
correct order, which makes it impossible to do anything reliable with 
either escape sequences or utf-8. See 
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=6920 for more details.

The other problem is, I need to convert escape sequences into mouse 
events, not characters. Even if the bug were fixed, and even if it were 
easy to detect the user's desired coding system and piggy-back on it 
reliably, there's a problem with event ordering. I can't put a mouse 
event in the buffer, so if I'm ever given text and mouse-escape-sequence 
I'd have to leave the buffer empty, then load up unread-command-events 
with both text and mouse events. I'm not sure where those rejoin the 
input processing chain, but even with (t . EVT) I suspect they skip 
something (input methods, perhaps?). Also, I had some trouble getting (t 
. EVT) to work from within :post-read-conversion, though that could just 
be a bug in my code.

> Other than that, it's no surprise that this is not easy: we ask Emacs
> to read keyboard input that is encoded in two different encodings,
> which is not how keyboard input was designed.
Not quite... emacs insists on interpreting everything as encoded in some 
way, and I'd really like to just get my hands on a few raw bytes before 
it does so.

> I think the only way that is easier and cleaner would be if Emacs
> could read the mouse input from a separate file descriptor.  We could
> then set that file descriptor to use a different encoding.  Of course,
> that would need low-level changes in Emacs, even if it is possible in
> xterm.
What I had in mind was simpler:

Right now we have input -> coding system -> input method -> read-char -> 
input-decode-map -> ... other keymaps galore ... -> read-key

read-key is nice because it comes after all the interpretation is 
complete (high-level). Read-char is nice for interpreting key sequences 
(e.g. if you're a key map, mid-level), and it makes perfect sense to 
apply coding systems first. However, for things like escape sequences 
(very low-level), which are raw bytes with a specific meaning regardless 
of locale (and even keyboard layout, for xterm), coding systems are 
unnecessary at best and harmful at worst.

I'd propose adding an input filtering mechanism, which allows to 
register functions which pick off the raw input and are expected to put 
back anything they doesn't need. Sort of like a coding system, but 
orthogonal. Then you'd have:

input -> input-filter(s) -> coding system -> input method -> read-char 
-> input-decode-map -> ... other keymaps galore ... -> read-key

That said, if coding systems can really be abused as you suggest, and 
the ordering bug could be fixed, and unread-command-events does the 
right thing, then that would have the same effect and I'd be happy to 
use it. If just seems like taking the long way around.

Ryan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-08-27 18:04     ` Ryan Johnson
@ 2010-08-27 20:38       ` Eli Zaretskii
  0 siblings, 0 replies; 21+ messages in thread
From: Eli Zaretskii @ 2010-08-27 20:38 UTC (permalink / raw)
  To: Ryan Johnson; +Cc: emacs-devel

> Date: Fri, 27 Aug 2010 20:04:48 +0200
> From: Ryan Johnson <ryanjohn@ece.cmu.edu>
> CC: emacs-devel@gnu.org
> 
>   On 8/27/2010 5:40 PM, Eli Zaretskii wrote:
> >> Date: Fri, 27 Aug 2010 16:44:12 +0200
> >> From: Ryan Johnson<ryanjohn@ece.cmu.edu>
> >>
> >> Is there really no other way to do this?
> > You could use the :post-read-conversion attribute of a coding-system.
> > That is, define a new coding-system that has this attribute specifying
> > a function you will write.  That function will first decode the mouse
> > stuff, and then decode the rest by the terminal-coding-system set by
> > the user.
> > You can see an example of this in ctext-with-extensions.  It is
> > defined on mule-conf.el and its post-read-conversion function is
> > defined on mule.el.
> That's actually what I tried first, but as I mentioned in the OP, emacs 
> doesn't always deliver characters to :post-read-conversion in the 
> correct order, which makes it impossible to do anything reliable with 
> either escape sequences or utf-8. See 
> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=6920 for more details.

You need to base your system on raw-text or no-conversion.  Then the
bytes will arrive unscathed.

> Right now we have input -> coding system -> input method -> read-char -> 
> input-decode-map -> ... other keymaps galore ... -> read-key
> 
> read-key is nice because it comes after all the interpretation is 
> complete (high-level). Read-char is nice for interpreting key sequences 
> (e.g. if you're a key map, mid-level), and it makes perfect sense to 
> apply coding systems first. However, for things like escape sequences 
> (very low-level), which are raw bytes with a specific meaning regardless 
> of locale (and even keyboard layout, for xterm), coding systems are 
> unnecessary at best and harmful at worst.

Well, mouse events aren't supposed to arrive by way of keyboard input,
either, you know.  They are supposed to come from a totally different
API.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-08-27 15:40   ` Eli Zaretskii
  2010-08-27 18:04     ` Ryan Johnson
@ 2010-08-27 23:54     ` Stefan Monnier
  2010-08-28  7:54       ` Ryan Johnson
  1 sibling, 1 reply; 21+ messages in thread
From: Stefan Monnier @ 2010-08-27 23:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Ryan Johnson, emacs-devel

> I think the only way that is easier and cleaner would be if Emacs
> could read the mouse input from a separate file descriptor.  We could

Note that under older Emacsen, read-event did not obey the
keyboard-coding-system at all: it only applied to read-key-sequence.
So maybe we should simply change read-event not to try and decoding
keyboard input.


        Stefan




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-08-27 23:54     ` Stefan Monnier
@ 2010-08-28  7:54       ` Ryan Johnson
  2010-08-28 14:47         ` Stefan Monnier
  0 siblings, 1 reply; 21+ messages in thread
From: Ryan Johnson @ 2010-08-28  7:54 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel

  On 8/28/2010 1:54 AM, Stefan Monnier wrote:
>> I think the only way that is easier and cleaner would be if Emacs
>> could read the mouse input from a separate file descriptor.  We could
> Note that under older Emacsen, read-event did not obey the
> keyboard-coding-system at all: it only applied to read-key-sequence.
> So maybe we should simply change read-event not to try and decoding
> keyboard input.
That might make sense... the caller could always apply the coding system 
manually before dumping things back in the unread-command-events queue.

Who currently uses read-* that might be affected? xt-mouse.el would love 
it, mouse.el certainly won't care, and other xterm processing will be 
indifferent.

BTW, I've been playing with read-key and it's perfect for making 
mouse.el and xt-mouse.el play nice together! I'm a tad unclear on the 
difference between read-key and read-key-sequence, though, other than 
the latter letting you supply a minibuffer prompt.

Ryan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-08-28  7:54       ` Ryan Johnson
@ 2010-08-28 14:47         ` Stefan Monnier
  2010-08-28 20:34           ` Ryan Johnson
  2010-08-31 23:12           ` Ryan Johnson
  0 siblings, 2 replies; 21+ messages in thread
From: Stefan Monnier @ 2010-08-28 14:47 UTC (permalink / raw)
  To: Ryan Johnson; +Cc: Eli Zaretskii, emacs-devel

> That might make sense... the caller could always apply the coding system
> manually before dumping things back in the unread-command-events queue.

Or coding-system decoding should be applied to events from
unread-command-events.

> Who currently uses read-* that might be affected? xt-mouse.el would love it,
> mouse.el certainly won't care, and other xterm processing will
> be indifferent.

As mentioned, read-event did not do obey keyboard-coding-system in
earlier Emacsen, so any affected package is more likely to be fixed than
broken by making a change that reverts to this previous behavior.

> BTW, I've been playing with read-key and it's perfect for making mouse.el
> and xt-mouse.el play nice together! I'm a tad unclear on the difference
> between read-key and read-key-sequence, though, other than the latter
> letting you supply a minibuffer prompt.

read-key only reads a single event.  I.e. only C-x not C-x C-a.
This event might be the result of processing several raw events
(e.g. via keyboard-coding-system, input-decode-map, ...).

        Stefan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-08-28 14:47         ` Stefan Monnier
@ 2010-08-28 20:34           ` Ryan Johnson
  2010-08-31 23:12           ` Ryan Johnson
  1 sibling, 0 replies; 21+ messages in thread
From: Ryan Johnson @ 2010-08-28 20:34 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel

  On 8/28/2010 4:47 PM, Stefan Monnier wrote:
>> That might make sense... the caller could always apply the coding system
>> manually before dumping things back in the unread-command-events queue.
> Or coding-system decoding should be applied to events from
> unread-command-events.
That would basically be the proposed 'read-byte' behavior, then? Where 
read-char can filter out the raw inputs, and anything that it puts back 
goes through the whole input processing chain? Works for me!
>> Who currently uses read-* that might be affected? xt-mouse.el would love it,
>> mouse.el certainly won't care, and other xterm processing will
>> be indifferent.
> As mentioned, read-event did not do obey keyboard-coding-system in
> earlier Emacsen, so any affected package is more likely to be fixed than
> broken by making a change that reverts to this previous behavior.
Assuming this does get fixed, the xterm mouse events will be vastly more 
reliable.

Meanwhile, I've got a patch mostly ready which makes xt-mouse.el behave 
much more like native mouse. Xterm has a mode which sends mouse motion 
events whenever a button is down, and making mouse.el use read-key lets 
me send those events separately. The result is that clicking and 
dragging are highly responsive, where before you didn't see anything 
happen until after button release. The visual feedback is really helpful 
when dragging to highlight text.

I should also be able to add support for multi-click by emulating the 
behavior described in the elisp manual. I kind of hoped the emacs core 
would do that for me, given a stream of mouse-down and mouse-up events, 
but it doesn't. Oh well... some more code to write but nothing terrible. 
The only thing that would be missing is support for track-mouse and 
mouse-face, because they're hardwired in C.

It would be really nice if there were a way to hook into the native 
mouse subsystem rather than reinventing the wheel... it already computes 
what window/buffer a given coordinate is in, grabs timestamps, detects 
multi-click, and implements track-mouse/mouse-face. Assuming proper 
hooks were available, xterm does have a full mouse-tracking mode which 
we could use. That's way over my head to tackle, though.

Should I clean up the current patch, try to add multi-click, or see if 
somebody wants to expose some native mouse hooks?

Ryan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-08-28 14:47         ` Stefan Monnier
  2010-08-28 20:34           ` Ryan Johnson
@ 2010-08-31 23:12           ` Ryan Johnson
  2010-09-02 10:53             ` Stefan Monnier
  1 sibling, 1 reply; 21+ messages in thread
From: Ryan Johnson @ 2010-08-31 23:12 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel

  On 8/28/2010 4:47 PM, Stefan Monnier wrote:
>> Who currently uses read-* that might be affected? xt-mouse.el would love it,
>> mouse.el certainly won't care, and other xterm processing will
>> be indifferent.
> As mentioned, read-event did not do obey keyboard-coding-system in
> earlier Emacsen, so any affected package is more likely to be fixed than
> broken by making a change that reverts to this previous behavior.
Hmm... here's a twist: The elisp docs under keymaps -> translation 
keymaps explain that:

If you have enabled keyboard character set decoding using
`set-keyboard-coding-system', decoding is done after the translations
listed above.  See Terminal I/O Encoding.  However, in future Emacs
versions, character set decoding may be done at an earlier stage.

However, the same info node admits that translation keymaps may want to 
read input (which does *not* escape I/O coding). So, suppose we do the 
following:

(define-key
   input-decode-map
   "\M-[M"; CSI M
   '(keymap; pb
     (t keymap; px
        (t keymap; py
           (t . xterm-mouse-translate)))))

In theory, the above matches any three characters following the start of 
the mouse escape sequence. Then inside xterm-mouse-translate 
(this-command-keys) comes close to being raw bytes. It now works great 
for any px I can throw at it, but still something goes wrong for py > #x7f.

If I print (this-command-keys-vector) after a mouse click at (0 . 95), I 
get: [27 91 77 32 33 4194176] -- mouse-down -- and then emacs hangs 
waiting for more input; the next key I type ends up prefixed by \200.  
The lossage buffer shows ESC [ M SPC ! \300\200 ESC [ M # ! \300\200, 
but I don't know where the 'ESC [ M # ! \300' part disappeared to -- it 
doesn't get inserted into any buffer and yet xterm-mouse-translate never 
gets called, either.

The docs seem out of date about where coding systems kick in... 
apparently it still tries to decode utf-8 somehow, even though I never 
call read-*.

Even if I (set-keyboard-coding-system 'no-conversion), the bytes turn 
into all kinds of strange M- and C- versions of characters, which is no 
fun to disentangle. 'raw-text and 'binary give different but equally 
not-fun sets of weirdness.

I'm beginning to think there's actually no way to get raw bytes from the 
terminal...

Ideas?
Ryan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-08-31 23:12           ` Ryan Johnson
@ 2010-09-02 10:53             ` Stefan Monnier
  2010-09-02 12:33               ` Ryan Johnson
       [not found]               ` <jwvy6bfdp23.fsf-monnier+emacs@gnu.org>
  0 siblings, 2 replies; 21+ messages in thread
From: Stefan Monnier @ 2010-09-02 10:53 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: Eli Zaretskii, Ryan Johnson, emacs-devel

> As mentioned, read-event did not do obey keyboard-coding-system in
> earlier Emacsen, so any affected package is more likely to be fixed than
> broken by making a change that reverts to this previous behavior.

Handa, could you take a look at the feasibility of moving the
decode_keyboard_code to a later stage such that read-event still returns
raw bytes for ttys?

There is a tension here, because raw events in GUIs are already decoded,
whereas raw events in ttys are just bytes.  You "fixed it" by decoding
tty input in directly in tty_read_avail_input, so that read-event now
always returns decoded input, but that in turns means that read-event
doesn't return raw events any more.  The decoding is desirable for
read-key-sequence (and maybe also for read-char, tho I don't care much
about this case since read-key is generally a better replacement) but
not for read-event, since access to raw events is important for things
like xt-mouse.el.

> Hmm... here's a twist: The elisp docs under keymaps -> translation keymaps
> explain that:

> If you have enabled keyboard character set decoding using
> `set-keyboard-coding-system', decoding is done after the translations
> listed above.  See Terminal I/O Encoding.  However, in future Emacs
> versions, character set decoding may be done at an earlier stage.

This doc is out of date, indeed.

        Stefan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-09-02 10:53             ` Stefan Monnier
@ 2010-09-02 12:33               ` Ryan Johnson
       [not found]               ` <jwvy6bfdp23.fsf-monnier+emacs@gnu.org>
  1 sibling, 0 replies; 21+ messages in thread
From: Ryan Johnson @ 2010-09-02 12:33 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel, Kenichi Handa

  On 9/2/2010 12:53 PM, Stefan Monnier wrote:
>> As mentioned, read-event did not do obey keyboard-coding-system in
>> earlier Emacsen, so any affected package is more likely to be fixed than
>> broken by making a change that reverts to this previous behavior.
> Handa, could you take a look at the feasibility of moving the
> decode_keyboard_code to a later stage such that read-event still returns
> raw bytes for ttys?
>
> There is a tension here, because raw events in GUIs are already decoded,
> whereas raw events in ttys are just bytes.  You "fixed it" by decoding
> tty input in directly in tty_read_avail_input, so that read-event now
> always returns decoded input, but that in turns means that read-event
> doesn't return raw events any more.  The decoding is desirable for
> read-key-sequence (and maybe also for read-char, tho I don't care much
> about this case since read-key is generally a better replacement) but
> not for read-event, since access to raw events is important for things
> like xt-mouse.el.
By the way, I noticed while working on something else [1] that read-key 
cannot actually replace all uses of read-event because the latter 
supports timeouts while the former does not. Would it be possible to add 
a timeout to read-key as a third (optional) parameter? I don't know 
whether read-key-delay would provide workarounds to some timeout uses, 
but it seems brittle.

>> If you have enabled keyboard character set decoding using
>> `set-keyboard-coding-system', decoding is done after the translations
>> listed above.  See Terminal I/O Encoding.  However, in future Emacs
>> versions, character set decoding may be done at an earlier stage.
> This doc is out of date, indeed.

It would be really nice to have, somewhere in the emacs docs, a diagram 
showing what processing happens to keyboard input, starting from raw 
bytes and UI events, and tracing them (or their translations) through 
coding systems, input methods, command loop, various keymaps, etc. and 
showing where in that process the different read-* functions intercept 
that data (and where the various unread-*-events reinsert things). A 
similar diagram for reading and writing files would probably also be useful.

This would not only make it easier to figure out how to interface new 
code with emacs, it would probably expose gaps in the API which make 
existing code unnecessarily complex and certain features impossible 
(mouse.el and xt-mouse.el suffer from both of those latter problems).

Unfortunately, even after spending so long on this problem I don't think 
I know enough to generate that diagram...

Ryan

[1] http://www.ece.cmu.edu/~ryanjohn/sticky-control.el




^ permalink raw reply	[flat|nested] 21+ messages in thread

[parent not found: <jwvy6bfdp23.fsf-monnier+emacs@gnu.org>]

* Re: Best way to intercept terminal escape sequences?
       [not found]               ` <jwvy6bfdp23.fsf-monnier+emacs@gnu.org>
@ 2010-09-07  0:32                 ` Kenichi Handa
  2010-09-08  9:05                   ` Stefan Monnier
  0 siblings, 1 reply; 21+ messages in thread
From: Kenichi Handa @ 2010-09-07  0:32 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Sorry, I have not read this thread.

In article <jwvy6bfdp23.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> As mentioned, read-event did not do obey keyboard-coding-system in
>>> earlier Emacsen, so any affected package is more likely to be fixed than
>>> broken by making a change that reverts to this previous behavior.
> > Handa, could you take a look at the feasibility of moving the
> > decode_keyboard_code to a later stage such that read-event still returns
> > raw bytes for ttys?

> ping?

"... read-event still returns raw bytes" means that you
can't get, for instance, A-ogonek event by read-event (or
read-char) even if you type A-ogonek from a terminal.  I
don't remember well but one reason of moving
keyboard-coding-system handling from keymap to the current
place was to make read-event on tty work as the same way as
that on graphic terminal.  So, I think we should not change
it.

Is it difficult to make new functions, say tty-getc and
tty-ungetc, to handle responsding escape sequences sent from
terminal?

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-09-07  0:32                 ` Kenichi Handa
@ 2010-09-08  9:05                   ` Stefan Monnier
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Monnier @ 2010-09-08  9:05 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

>>>> As mentioned, read-event did not do obey keyboard-coding-system in
>>>> earlier Emacsen, so any affected package is more likely to be fixed than
>>>> broken by making a change that reverts to this previous behavior.
>> > Handa, could you take a look at the feasibility of moving the
>> > decode_keyboard_code to a later stage such that read-event still returns
>> > raw bytes for ttys?
>> ping?
> "... read-event still returns raw bytes" means that you can't get, for
> instance, A-ogonek event by read-event (or read-char) even if you type
> A-ogonek from a terminal.  I don't remember well but one reason of
> moving keyboard-coding-system handling from keymap to the current
> place was to make read-event on tty work as the same way as that on
> graphic terminal.  So, I think we should not change it.

IIRC the reason to decode earlier was so that things like
input-decode-map and friends get to see real chars rather than bytes,
but none of those reasons require read-event to always return chars.

IIRC one of the main problems with earlier code was that encoded-kb
would sometimes see bytes and sometimes chars (i.e. it would normally
see bytes under a tty except when using leim), so it sometimes
incorrectly took chars for bytes and re-decoded them.

> Is it difficult to make new functions, say tty-getc and tty-ungetc, to
> handle responsding escape sequences sent from terminal?

It doesn't look easy at all, since the current code does decoding before
placing the bytes in the event-queue (i.e. by the time Elisp code gets
a chance to look at the queue, the decoding has already taken place).

It would be better to decode later (i.e. after placing the events on
the event-queue), tho still before leim and input-decode-map.

We can then easily provide a "new" function to read "decoded events".
We could call it ... read-char ;-)

        Stefan

^ permalink raw reply	[flat|nested] 21+ messages in thread

[parent not found: <20100827112348.5023B3D5@osgood.ece.cmu.edu>]

* Re: Best way to intercept terminal escape sequences?
       [not found] <20100827112348.5023B3D5@osgood.ece.cmu.edu>
@ 2010-08-27 13:56 ` Ryan Johnson
  2010-08-27 14:17   ` David Kastrup
  0 siblings, 1 reply; 21+ messages in thread
From: Ryan Johnson @ 2010-08-27 13:56 UTC (permalink / raw)
  To: emacs-devel

  On Fri, 27 Aug 2010 12:36:52 +0200, David Kastrup wrote:
> Ryan Johnson<ryanjohn@ece.cmu.edu>  writes
>> I tried setting the keyboard-coding-system to iso-latin-1, but no
>> luck.
> You should set it to raw-text, do your mouse code preprocessing, and
> afterwards decode the remainder using the intended coding system.
Like this?

(defun xterm-mouse-event-read ()
   (let ((c (read-char)))
     (cond
      ;; out-of-bounds values come back as zero
      ((eq c 0) #x100)
      ;; 8-bit characters come back weird
      ((> c (unibyte-char-to-multibyte #xff))
       (+ #x80 (logand #xff c)))
      ((> c #xff)
       (multibyte-char-to-unibyte c))
      ;; normal 7-bit character
      (c))))

(defun xterm-mouse-pos-read ()
   (let ((old-coding (keyboard-coding-system)))
     (set-keyboard-coding-system 'raw-text)
     (unwind-protect
         (cons (xterm-mouse-event-read) (xterm-mouse-event-read))
       (set-keyboard-coding-system old-coding))))

That sort of works, but has two major problems:

First, it's unusably slow. The terminal appears to do several full 
redraw operations with every mouse click, which takes several tenths of 
a second.

Second, it's buggy. If I release the mouse button during that 
flickering, there's a very good chance for things to go very wrong. 
Looking at the lossage buffer for a double-click-gone-bad at (235 . 117) 
gave:

ESC [ M SPC \301\253 u
ESC [ M # u
ESC
ESC [ M # u
C-g

That's a mouse-down, followed by part of a mouse-up (missing px), 
followed by a lone ESC, followed by another partial mouse-up, followed 
by the keyboard-quit I sent when I saw this minibuffer prompt:

ESC [ M # u-

In between each line was a raw-text --> utf-8-unix --> raw-text 
transition. Does something fail to release buffered-up characters when 
it gets swapped out? Or might this be related to bug #6920 in some way?

Ryan





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Best way to intercept terminal escape sequences?
  2010-08-27 13:56 ` Ryan Johnson
@ 2010-08-27 14:17   ` David Kastrup
  0 siblings, 0 replies; 21+ messages in thread
From: David Kastrup @ 2010-08-27 14:17 UTC (permalink / raw)
  To: emacs-devel

Ryan Johnson <ryanjohn@ece.cmu.edu> writes:

>  On Fri, 27 Aug 2010 12:36:52 +0200, David Kastrup wrote:
>> Ryan Johnson<ryanjohn@ece.cmu.edu>  writes
>>> I tried setting the keyboard-coding-system to iso-latin-1, but no
>>> luck.
>> You should set it to raw-text, do your mouse code preprocessing, and
>> afterwards decode the remainder using the intended coding system.
> Like this?
>
> (defun xterm-mouse-event-read ()
>   (let ((c (read-char)))
>     (cond
>      ;; out-of-bounds values come back as zero
>      ((eq c 0) #x100)
>      ;; 8-bit characters come back weird
>      ((> c (unibyte-char-to-multibyte #xff))
>       (+ #x80 (logand #xff c)))
>      ((> c #xff)
>       (multibyte-char-to-unibyte c))
>      ;; normal 7-bit character
>      (c))))

I don't see that you are setting the 

> (defun xterm-mouse-pos-read ()
>   (let ((old-coding (keyboard-coding-system)))
>     (set-keyboard-coding-system 'raw-text)
>     (unwind-protect
>         (cons (xterm-mouse-event-read) (xterm-mouse-event-read))
>       (set-keyboard-coding-system old-coding))))

You can't go setting the keyboard reading system back and forth.  It
operates into a buffer even before calling read-char.  You have to put
it to raw and stick with it.

> First, it's unusably slow.

It would probably be best to write a CCL program for that sort of
thing.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Best way to intercept terminal escape sequences?
@ 2010-08-26 13:22 Ryan Johnson
  0 siblings, 0 replies; 21+ messages in thread
From: Ryan Johnson @ 2010-08-26 13:22 UTC (permalink / raw)
  To: emacs-devel

  Hi all,

Terminal escape sequences cause emacs a lot of pain, with problems like 
bug #6758 and the contortions performed by xt-mouse.el being just two 
examples.

I've been wrestling with xt-mouse.el for the last week and it seems to 
me that many or all of these problems arise because escape sequences are 
treated like any other string of characters when in fact they are nearly 
all encoded events of some sort. For example, the xterm escape sequence 
"ESC O D" is eventually converted to <right>, but anybody calling 
read-char or read-event will get a string of characters instead (and 
probably wish they hadn't). This is why xt-mouse can't do full mouse 
handling -- mouse-drag-track (mouse.el) calls read-event to watch for 
the mouse-up event. If xt-mouse simply translated mouse escapes into 
events, mouse-drag-track would intercept the mouse-up sequence and think 
it was is garbage, with xt-mouse never having a chance to decode it. So, 
instead, xt-mouse reinvents the wheel (poorly) and has to jump through 
hoops to send [down-mouse-1 move-mouse-1 mouse-1] all at once to 
simulate a mouse drag after it has finished.

Problems like this could be solved by allowing the system to process 
terminal escape sequences early in the food chain (= before read-char 
and read-event). The idea would be to let a terminal translator 
interpose on the keyboard input before anything else -- even coding 
systems -- and filter known escape sequences. The interposition would 
have two key features:

1. Recognized key sequences can be absorbed completely or (more likely) 
converted to events which appear at read-event in the proper order -- (a 
ESC O D b) would make three read-event calls return (a <right> b).
2. Unrecognized sequences are passed through, unchanged, for normal 
handling by the rest of the input processing chain -- (ESC x x) would be 
ignored

With this interposition in place, terminal events could become first 
class citizens. Function key presses would appear to *everyone* as such, 
mouse events would be processed by the normal mouse-handling code, etc.

The input-decode-map introduced by v23 is a nice start (it can absorb 
the sequence or generate events as needed), but it doesn't pass through 
unrecognized sequences and read-char/read-event bypass it.

Unfortunately read-* are written in C and can't be advised effectively 
(I tried but the warnings were accurate). Input methods don't seem to 
work, even if I knew how to create one (they receive printable chars 
only and the read-* have to ask for them). I also tried creating a 
keyboard coding system to intercept terminal escape sequences as a 
proof-of-concept, but was thwarted by bug #6920 (and that's not the 
right place for it anyway).

Alternatively, I toyed with the idea of making mouse.el install an 
overriding-terminal-local-map (to silently ignore everything except ^G 
and mouse events) instead of calling read-event, but that would require 
significant rewriting and isn't a general fix for the escape sequence 
problem. I tried source diving for more ideas, but read_filtered_event 
(lread.c) calls read_char (keyboard.c), which is 900 lines of Greek to me.

Thoughts? What would be the least intrusive way to support escape 
sequences better? Or is there a better way mouse.el should process 
events so the existing input-decode-map becomes effective?

(please CC me in all replies -- I'm signed up for daily digests)

Thanks,
Ryan

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-09-08  9:05 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-26 15:58 Best way to intercept terminal escape sequences? Ryan Johnson
2010-08-26 23:03 ` Stefan Monnier
2010-08-27  9:28   ` Ryan Johnson
2010-08-27 10:36     ` David Kastrup
2010-08-27 23:50       ` Stefan Monnier
     [not found] <20100827142724.E1DD712F@hazard.ece.cmu.edu>
2010-08-27 14:44 ` Ryan Johnson
2010-08-27 15:40   ` Eli Zaretskii
2010-08-27 18:04     ` Ryan Johnson
2010-08-27 20:38       ` Eli Zaretskii
2010-08-27 23:54     ` Stefan Monnier
2010-08-28  7:54       ` Ryan Johnson
2010-08-28 14:47         ` Stefan Monnier
2010-08-28 20:34           ` Ryan Johnson
2010-08-31 23:12           ` Ryan Johnson
2010-09-02 10:53             ` Stefan Monnier
2010-09-02 12:33               ` Ryan Johnson
     [not found]               ` <jwvy6bfdp23.fsf-monnier+emacs@gnu.org>
2010-09-07  0:32                 ` Kenichi Handa
2010-09-08  9:05                   ` Stefan Monnier
     [not found] <20100827112348.5023B3D5@osgood.ece.cmu.edu>
2010-08-27 13:56 ` Ryan Johnson
2010-08-27 14:17   ` David Kastrup
  -- strict thread matches above, loose matches on Subject: below --
2010-08-26 13:22 Ryan Johnson

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).