From: Ryan Johnson <ryanjohn@ece.cmu.edu>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: emacs-devel@gnu.org
Subject: Re: Best way to intercept terminal escape sequences?
Date: Fri, 27 Aug 2010 11:28:21 +0200 [thread overview]
Message-ID: <4C778535.9020206@ece.cmu.edu> (raw)
In-Reply-To: <jwv1v9lq8vd.fsf-monnier+emacs@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 2587 bytes --]
On 8/27/2010 1:03 AM, Stefan Monnier wrote:
>> all encoded events of some sort. For example, the xterm escape sequence
>> "ESC O D" is eventually converted to<right>, but anybody calling
>> read-char or read-event will get a string of characters instead (and
>> probably wish they hadn't).
> Yes, that's why they should use `read-key' instead.
!!
I didn't know about that function. I tried s/read-event/read-char/ in
mouse.el, and nothing obvious broke. However, I'll need to modify
xt-mouse.el to take advantage of the change -- it still doesn't trust
mouse.el.
Meanwhile, though, I'm hitting another problem which still seems to
require a lower-than-low-level equivalent for read-char which can bypass
coding systems.
For reasons I don't understand, xterm's mouse escape sequence is
completely non-standard: "ESC [ M pb px py". The p* are bytes taking
values between 33 and 255, and there is no terminator byte.***
The result of this unfortunate design decision is that px and py can
throw a huge wrench in emacs' utf-8 decoding, because a (px py) pair can
look like all kinds of utf-8 (valid or otherwise). As long as px < 0xe0
*and* py< 0xC0 I can reliably decompose the unicode char I'm given into
the original utf-8 sequence (possibly in the illegal C0 or C1 range) and
py follows immediately to satisfy the decoder. The rest of the time,
though, the lack of a sequence terminator leaves me at the decoder's
mercy to decide how to interpret the bytes, and it's at the mercy of
whatever input follows the mouse sequence.
I tried setting the keyboard-coding-system to iso-latin-1, but no luck.
For down-mouse and move-mouse, things works out all right. A mouse-up or
mouse-move always follows and the leading ESC makes emacs give up and
hand over the raw bytes. The up-mouse sequence, though, doesn't usually
have anything after it, so emacs waits patiently for more input.
Clicking at position (184 . 136), for example, leaves the minibuffer
prompt: "ESC [ M SPC \300\270 \300\210 ESC [ M #-". Fortunately, ^G
works as expected, so it's merely annoying. This is a huge improvement
over the current practice of dumping a bunch of garbage bytes into the
user's buffer, so I'm attaching a preliminary patch (against the
emacs-23.2 release, btw).
Thoughts?
Ryan
*** It *really* should have been something like "ESC [ M pb ; px ; py m"
with the p* being string representations of integers. I'm discussing
possible fixes with the xterm maintainer, but it could be a while (and
would require creating a new mouse mode to avoid breaking existing apps).
[-- Attachment #2: xt-mouse.el.utf-8.patch --]
[-- Type: text/plain, Size: 2464 bytes --]
--- xt-mouse.new.el 2010-08-27 09:22:47.640625000 +0200
+++ xt-mouse.utf-8.el 2010-08-27 11:02:20.031250000 +0200
@@ -120,16 +120,36 @@
pos)
;; read xterm sequences above ascii 127 (#x7f)
-(defun xterm-mouse-event-read ()
- (let ((c (read-char)))
+(defun xterm-mouse-pos-read ()
+ "Read positions from an xterm mouse escape sequence.
+
+This job is complicated because xterm can emit (px py) pairs
+which look like (possibly invalid) UTF-8 sequences which emacs
+dutifully decodes into unicode characters.
+
+UTF-8 sequencing can occur any time px and py are both greater
+than #x80. For terminals smaller than 128x128, this function can
+correct the problem because only px can trigger the
+confusion. Terminals taller than 128 lines pose a more difficult
+problem because the py can look like the start of a 2-byte UTF-8
+sequence, and xterm sends no sequence terminator which we could
+use to detect the 'EOF'. This tends to leave emacs waiting for
+input after mouse-up."
+ (let* ((c (read-char))
+ (b (multibyte-char-to-unibyte c)))
(cond
;; mouse clicks outside the encodable range produce 0
- ((= c 0) #x100)
- ;; 8-bit control characters which don't pair up with the next
- ;; char come back as "pseudo-negative", e.g. #x3fff??
- ((> c #x7ff) (logand c #xff))
+ ((eq b 0) (cons #x100 (multibyte-char-to-unibyte (read-char))))
+ ;; emacs treats some combinations of px py like utf-8
+ ((or
+ ;; normal ("C2") sequences don't convert back to a single byte
+ (eq b -1)
+ ;; illegal ("C0") sequences convert back to one (legal) char
+ (not (eq c (unibyte-char-to-multibyte b))))
+ ;; pick the char apart into two bytes
+ (cons (+ #xc0 (lsh c -6)) (+ #x80 (logand c #x3f))))
;; normal case
- (c))))
+ ((cons b (multibyte-char-to-unibyte (read-char)))))))
(defun xterm-mouse-truncate-wrap (f)
"Truncate with wrap-around."
@@ -148,9 +168,10 @@
(defun xterm-mouse-event ()
"Convert XTerm mouse event to Emacs mouse event."
- (let* ((type (- (xterm-mouse-event-read) #o40))
- (x (- (xterm-mouse-event-read) #o40 1))
- (y (- (xterm-mouse-event-read) #o40 1))
+ (let* ((type (- (read-char) #o40))
+ (pos (xterm-mouse-pos-read))
+ (x (- (car pos) #o40 1))
+ (y (- (cdr pos) #o40 1))
;; Emulate timestamp information. This is accurate enough
;; for default value of mouse-1-click-follows-link (450msec).
(timestamp (xterm-mouse-truncate-wrap
next prev parent reply other threads:[~2010-08-27 9:28 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-26 15:58 Best way to intercept terminal escape sequences? Ryan Johnson
2010-08-26 23:03 ` Stefan Monnier
2010-08-27 9:28 ` Ryan Johnson [this message]
2010-08-27 10:36 ` David Kastrup
2010-08-27 23:50 ` Stefan Monnier
[not found] <20100827142724.E1DD712F@hazard.ece.cmu.edu>
2010-08-27 14:44 ` Ryan Johnson
2010-08-27 15:40 ` Eli Zaretskii
2010-08-27 18:04 ` Ryan Johnson
2010-08-27 20:38 ` Eli Zaretskii
2010-08-27 23:54 ` Stefan Monnier
2010-08-28 7:54 ` Ryan Johnson
2010-08-28 14:47 ` Stefan Monnier
2010-08-28 20:34 ` Ryan Johnson
2010-08-31 23:12 ` Ryan Johnson
2010-09-02 10:53 ` Stefan Monnier
2010-09-02 12:33 ` Ryan Johnson
[not found] ` <jwvy6bfdp23.fsf-monnier+emacs@gnu.org>
2010-09-07 0:32 ` Kenichi Handa
2010-09-08 9:05 ` Stefan Monnier
[not found] <20100827112348.5023B3D5@osgood.ece.cmu.edu>
2010-08-27 13:56 ` Ryan Johnson
2010-08-27 14:17 ` David Kastrup
-- strict thread matches above, loose matches on Subject: below --
2010-08-26 13:22 Ryan Johnson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C778535.9020206@ece.cmu.edu \
--to=ryanjohn@ece.cmu.edu \
--cc=emacs-devel@gnu.org \
--cc=monnier@iro.umontreal.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).