From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Max Mikhanosha Newsgroups: gmane.emacs.devel Subject: Bugfix for utf-8 XTerm/MinTTY and (set-input-meta-mode t) Date: Tue, 01 Jun 2021 16:19:40 +0000 Message-ID: Reply-To: Max Mikhanosha Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5968"; mail-complaints-to="usenet@ciao.gmane.io" To: "emacs-devel@gnu.org" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Jun 01 18:37:40 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lo7Od-0001IE-Br for ged-emacs-devel@m.gmane-mx.org; Tue, 01 Jun 2021 18:37:39 +0200 Original-Received: from localhost ([::1]:60964 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lo7Oc-0007OE-AS for ged-emacs-devel@m.gmane-mx.org; Tue, 01 Jun 2021 12:37:38 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:37612) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lo77r-0004s2-UQ for emacs-devel@gnu.org; Tue, 01 Jun 2021 12:20:19 -0400 Original-Received: from mail2.protonmail.ch ([185.70.40.22]:62133) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lo77p-0005ib-76 for emacs-devel@gnu.org; Tue, 01 Jun 2021 12:20:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail; t=1622564407; bh=S/A0AU3AJp1BPIYCsWlyti6Zb71JzmSkfW9+Vhudmz4=; h=Date:To:From:Reply-To:Subject:From; b=va5fZSqQ+tNIOLM5raff5dkEyAuid35IIbT6tCwRySOccthop1cZ2rUToX/C1hbPx nqjv5MnEy9Q1HxnQaFfJcZVLYIVMgjZs49m2mwUnVBeptbN4nzfWl1EKXycqtg040F QujI+81xueX1KY04dIXXH/rotoL06s+2PfTOl7Vs= Received-SPF: pass client-ip=185.70.40.22; envelope-from=max.mikhanosha@protonmail.com; helo=mail2.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Tue, 01 Jun 2021 12:36:30 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:270219 Archived-At: Emacs incorrectly handles (set-input-meta-mode t) (the meta in the 8th bit = of input) when terminal is in UTF-8 mode. Both XTerm and MinTTY, when configured to send meta modifier as 8th bit whi= le in utf-8 mode, will first add 8th bit, and then encode resulting charact= er with utf-8. For example Meta-X is encoded as ?x+120 =3D #248 codepoint, = encoded as 0xc3,0xb8 But Emacs handles meta modifier in the 8th bit in tty_read_avail_input, bef= ore decoding the raw keyboard input. So it erroneously treats 0xc3,0xb8 input as two ordinary ASCII characters w= ith meta modifier set, stripping the 8th bit and garbling the input. This problem had existed for a long time, and had frustrated at least a few= hundred people, as can be seen by the view count on stackoverflow article = that comes up when googling "emacs utf8 xterm" Below patch fixes this bug, by making 8th bit meta key handling to work cor= rectly in utf8 mode. I have tested it with xterm and mintty and meta keys, and meta-control keys= now work correctly regardless if terminals are in utf-8 mode. Diff against Emacs-26 branch pasted below diff --git a/src/coding.c b/src/coding.c index 078c1c4e6a..743fceb32c 100644 --- a/src/coding.c +++ b/src/coding.c @@ -5989,6 +5989,11 @@ raw_text_coding_system_p (struct coding_system *codi= ng) =09 && coding->encoder =3D=3D encode_coding_raw_text) ? true : false; } +bool utf_8_input_coding_system_p(struct coding_system *coding) +{ + return (coding->decoder =3D=3D decode_coding_utf_8) ? true : false; +} + /* If CODING_SYSTEM doesn't specify end-of-line format, return one of the subsidiary that has the same eol-spec as PARENT (if it is not diff --git a/src/coding.h b/src/coding.h index aab8c2d438..6124330a1f 100644 --- a/src/coding.h +++ b/src/coding.h @@ -702,6 +702,7 @@ extern Lisp_Object encode_file_name (Lisp_Object); extern Lisp_Object decode_file_name (Lisp_Object); extern Lisp_Object raw_text_coding_system (Lisp_Object); extern bool raw_text_coding_system_p (struct coding_system *); +extern bool utf_8_input_coding_system_p (struct coding_system *); extern Lisp_Object coding_inherit_eol_type (Lisp_Object, Lisp_Object); extern Lisp_Object complement_process_encoding_system (Lisp_Object); diff --git a/src/keyboard.c b/src/keyboard.c index aa3448439b..84acf4a998 100644 --- a/src/keyboard.c +++ b/src/keyboard.c @@ -2235,14 +2235,16 @@ read_decoded_event_from_main_queue (struct timespec= *end_time, =09return nextevt;=09=09/* No decoding needed. */ else =09{ +=09 struct coding_system *coding =3D TERMINAL_KEYBOARD_CODING (terminal); +=09 bool utf8_input_terminal =3D utf_8_input_coding_system_p (coding); =09 int meta_key =3D terminal->display_info.tty->meta_key; + =09 eassert (n < MAX_ENCODED_BYTES); =09 events[n++] =3D nextevt; + =09 if (NATNUMP (nextevt) -=09 && XINT (nextevt) < (meta_key =3D=3D 1 ? 0x80 : 0x100)) +=09 && XINT (nextevt) < ((meta_key =3D=3D 1 && !utf8_input_terminal) = ? 0x80 : 0x100)) =09 { /* An encoded byte sequence, let's try to decode it. */ -=09 struct coding_system *coding -=09=09=3D TERMINAL_KEYBOARD_CODING (terminal); =09 if (raw_text_coding_system_p (coding)) =09=09{ @@ -2253,12 +2255,13 @@ read_decoded_event_from_main_queue (struct timespec= *end_time, =09=09} =09 else =09=09{ + =09=09 unsigned char src[MAX_ENCODED_BYTES]; =09=09 unsigned char dest[MAX_ENCODED_BYTES * MAX_MULTIBYTE_LENGTH]; =09=09 int i; =09=09 for (i =3D 0; i < n; i++) =09=09 src[i] =3D XINT (events[i]); -=09=09 if (meta_key !=3D 2) +=09=09 if (!utf8_input_terminal && meta_key !=3D 2) =09=09 for (i =3D 0; i < n; i++) =09=09 src[i] &=3D ~0x80; =09=09 coding->destination =3D dest; @@ -2275,8 +2278,21 @@ read_decoded_event_from_main_queue (struct timespec = *end_time, =09=09 const unsigned char *p =3D coding->destination; =09=09 eassert (coding->carryover_bytes =3D=3D 0); =09=09 n =3D 0; -=09=09 while (n < coding->produced_char) -=09=09=09events[n++] =3D make_number (STRING_CHAR_ADVANCE (p)); + while (n < coding->produced_char) + { + int c =3D STRING_CHAR_ADVANCE (p); +=09=09=09 if (utf8_input_terminal) +=09=09=09 { +=09=09=09 /* put meta modifier on the key */ +=09=09=09 int modifier =3D 0; +=09=09=09 if (meta_key =3D=3D 1 && c < 0x100 && (c & 0x80)) +=09=09=09=09modifier =3D meta_modifier; +=09=09=09 if (meta_key !=3D 2) +=09=09=09=09c &=3D ~0x80; +=09=09=09 c |=3D modifier; +=09=09=09 } +=09=09=09 events[n++] =3D make_number (c); + } =09=09 } =09=09} =09 } @@ -7118,16 +7134,31 @@ tty_read_avail_input (struct terminal *terminal, #endif /* not MSDOS */ #endif /* not WINDOWSNT */ + bool utf8_input_terminal =3D utf_8_input_coding_system_p (TERMINAL_KEYBO= ARD_CODING(terminal)); + for (i =3D 0; i < nread; i++) { struct input_event buf; EVENT_INIT (buf); buf.kind =3D ASCII_KEYSTROKE_EVENT; buf.modifiers =3D 0; - if (tty->meta_key =3D=3D 1 && (cbuf[i] & 0x80)) - buf.modifiers =3D meta_modifier; - if (tty->meta_key !=3D 2) - cbuf[i] &=3D ~0x80; + + /* Both XTerm and MinTTY in utf8:true + MetaSendEscape:false mode + send Meta + ASCII letters by first adding 0x80, and then UTF-8 + encoding the result. + + Therefore trying to detect 0x80 meta key flag now not only + confuses meta key with UTF-8 encoding, but also loses + information by stripping the 8th bit from UTF-8 input before + decoding + */ + if (!utf8_input_terminal) +=09{ +=09 if (tty->meta_key =3D=3D 1 && (cbuf[i] & 0x80)) +=09 buf.modifiers =3D meta_modifier; +=09 if (tty->meta_key !=3D 2) +=09 cbuf[i] &=3D ~0x80; +=09} buf.code =3D cbuf[i]; /* Set the frame corresponding to the active tty. Note that the