unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Ryan Johnson <ryanjohn@ece.cmu.edu>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: emacs-devel@gnu.org
Subject: Re: Best way to intercept terminal escape sequences?
Date: Fri, 27 Aug 2010 11:28:21 +0200	[thread overview]
Message-ID: <4C778535.9020206@ece.cmu.edu> (raw)
In-Reply-To: <jwv1v9lq8vd.fsf-monnier+emacs@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 2587 bytes --]

  On 8/27/2010 1:03 AM, Stefan Monnier wrote:
>> all encoded events of some sort. For example, the xterm escape sequence
>> "ESC O D" is eventually converted to<right>, but anybody calling
>> read-char or read-event will get a string of characters instead (and
>> probably wish they hadn't).
> Yes, that's why they should use `read-key' instead.
!!

I didn't know about that function. I tried s/read-event/read-char/ in 
mouse.el, and nothing obvious broke. However, I'll need to modify 
xt-mouse.el to take advantage of the change -- it still doesn't trust 
mouse.el.

Meanwhile, though, I'm hitting another problem which still seems to 
require a lower-than-low-level equivalent for read-char which can bypass 
coding systems.

For reasons I don't understand, xterm's mouse escape sequence is 
completely non-standard: "ESC [ M pb px py". The p* are bytes taking 
values between 33 and 255, and there is no terminator byte.***

The result of this unfortunate design decision is that px and py can 
throw a huge wrench in emacs' utf-8 decoding, because a (px py) pair can 
look like all kinds of utf-8 (valid or otherwise). As long as px < 0xe0 
*and* py< 0xC0 I can reliably decompose the unicode char I'm given into 
the original utf-8 sequence (possibly in the illegal C0 or C1 range) and 
py follows immediately to satisfy the decoder. The rest of the time, 
though, the lack of a sequence terminator leaves me at the decoder's 
mercy to decide how to interpret the bytes, and it's at the mercy of 
whatever input follows the mouse sequence.

I tried setting the keyboard-coding-system to iso-latin-1, but no luck.

For down-mouse and move-mouse, things works out all right. A mouse-up or 
mouse-move always follows and the leading ESC makes emacs give up and 
hand over the raw bytes. The up-mouse sequence, though, doesn't usually 
have anything after it, so emacs waits patiently for more input. 
Clicking at position (184 . 136), for example, leaves the minibuffer 
prompt: "ESC [ M SPC \300\270 \300\210 ESC [ M #-". Fortunately, ^G 
works as expected, so it's merely annoying. This is a huge improvement 
over the current practice of dumping a bunch of garbage bytes into the 
user's buffer, so I'm attaching a preliminary patch (against the 
emacs-23.2 release, btw).

Thoughts?
Ryan

*** It *really* should have been something like "ESC [ M pb ; px ; py m" 
with the p* being string representations of integers. I'm discussing 
possible fixes with the xterm maintainer, but it could be a while (and 
would require creating a new mouse mode to avoid breaking existing apps).

[-- Attachment #2: xt-mouse.el.utf-8.patch --]
[-- Type: text/plain, Size: 2464 bytes --]

--- xt-mouse.new.el	2010-08-27 09:22:47.640625000 +0200
+++ xt-mouse.utf-8.el	2010-08-27 11:02:20.031250000 +0200
@@ -120,16 +120,36 @@
   pos)
 
 ;; read xterm sequences above ascii 127 (#x7f)
-(defun xterm-mouse-event-read ()
-  (let ((c (read-char)))
+(defun xterm-mouse-pos-read ()
+  "Read positions from an xterm mouse escape sequence.
+
+This job is complicated because xterm can emit (px py) pairs
+which look like (possibly invalid) UTF-8 sequences which emacs
+dutifully decodes into unicode characters.
+
+UTF-8 sequencing can occur any time px and py are both greater
+than #x80.  For terminals smaller than 128x128, this function can
+correct the problem because only px can trigger the
+confusion. Terminals taller than 128 lines pose a more difficult
+problem because the py can look like the start of a 2-byte UTF-8
+sequence, and xterm sends no sequence terminator which we could
+use to detect the 'EOF'. This tends to leave emacs waiting for
+input after mouse-up."
+  (let* ((c (read-char))
+	 (b (multibyte-char-to-unibyte c)))
     (cond
      ;; mouse clicks outside the encodable range produce 0
-     ((= c 0) #x100)
-     ;; 8-bit control characters which don't pair up with the next
-     ;; char come back as "pseudo-negative", e.g. #x3fff??
-     ((> c #x7ff) (logand c #xff))
+     ((eq b 0) (cons #x100 (multibyte-char-to-unibyte (read-char))))
+     ;; emacs treats some combinations of px py like utf-8
+     ((or
+       ;; normal ("C2") sequences don't convert back to a single byte
+       (eq b -1)
+       ;; illegal ("C0") sequences convert back to one (legal) char
+       (not (eq c (unibyte-char-to-multibyte b))))
+      ;; pick the char apart into two bytes
+      (cons (+ #xc0 (lsh c -6)) (+ #x80 (logand c #x3f))))
      ;; normal case
-     (c))))
+     ((cons b (multibyte-char-to-unibyte (read-char)))))))
 
 (defun xterm-mouse-truncate-wrap (f)
   "Truncate with wrap-around."
@@ -148,9 +168,10 @@
 
 (defun xterm-mouse-event ()
   "Convert XTerm mouse event to Emacs mouse event."
-  (let* ((type (- (xterm-mouse-event-read) #o40))
-	 (x (- (xterm-mouse-event-read) #o40 1))
-	 (y (- (xterm-mouse-event-read) #o40 1))
+  (let* ((type (- (read-char) #o40))
+	 (pos (xterm-mouse-pos-read))
+	 (x (- (car pos) #o40 1))
+	 (y (- (cdr pos) #o40 1))
 	 ;; Emulate timestamp information.  This is accurate enough
 	 ;; for default value of mouse-1-click-follows-link (450msec).
 	 (timestamp (xterm-mouse-truncate-wrap

  reply	other threads:[~2010-08-27  9:28 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-26 15:58 Best way to intercept terminal escape sequences? Ryan Johnson
2010-08-26 23:03 ` Stefan Monnier
2010-08-27  9:28   ` Ryan Johnson [this message]
2010-08-27 10:36     ` David Kastrup
2010-08-27 23:50       ` Stefan Monnier
     [not found] <20100827142724.E1DD712F@hazard.ece.cmu.edu>
2010-08-27 14:44 ` Ryan Johnson
2010-08-27 15:40   ` Eli Zaretskii
2010-08-27 18:04     ` Ryan Johnson
2010-08-27 20:38       ` Eli Zaretskii
2010-08-27 23:54     ` Stefan Monnier
2010-08-28  7:54       ` Ryan Johnson
2010-08-28 14:47         ` Stefan Monnier
2010-08-28 20:34           ` Ryan Johnson
2010-08-31 23:12           ` Ryan Johnson
2010-09-02 10:53             ` Stefan Monnier
2010-09-02 12:33               ` Ryan Johnson
     [not found]               ` <jwvy6bfdp23.fsf-monnier+emacs@gnu.org>
2010-09-07  0:32                 ` Kenichi Handa
2010-09-08  9:05                   ` Stefan Monnier
     [not found] <20100827112348.5023B3D5@osgood.ece.cmu.edu>
2010-08-27 13:56 ` Ryan Johnson
2010-08-27 14:17   ` David Kastrup
  -- strict thread matches above, loose matches on Subject: below --
2010-08-26 13:22 Ryan Johnson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C778535.9020206@ece.cmu.edu \
    --to=ryanjohn@ece.cmu.edu \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).