From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: npostavs@users.sourceforge.net Newsgroups: gmane.emacs.bugs Subject: bug#25288: 25.1; term, ansi-term, broken output of utf8 text Date: Wed, 28 Dec 2016 21:37:19 -0500 Message-ID: <87inq38nq8.fsf@users.sourceforge.net> References: <87r34r98ex.fsf@users.sourceforge.net> <83h95nvojh.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: blaine.gmane.org 1482979037 21461 195.159.176.226 (29 Dec 2016 02:37:17 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 29 Dec 2016 02:37:17 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) Cc: 25288@debbugs.gnu.org, fvamail@gmail.com To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Dec 29 03:37:12 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMQac-0004n7-Gj for geb-bug-gnu-emacs@m.gmane.org; Thu, 29 Dec 2016 03:37:10 +0100 Original-Received: from localhost ([::1]:33530 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cMQah-0000SY-F3 for geb-bug-gnu-emacs@m.gmane.org; Wed, 28 Dec 2016 21:37:15 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50537) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cMQaX-0000RF-FV for bug-gnu-emacs@gnu.org; Wed, 28 Dec 2016 21:37:06 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cMQaU-0001LP-AJ for bug-gnu-emacs@gnu.org; Wed, 28 Dec 2016 21:37:05 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:43005) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cMQaU-0001Ku-6Q for bug-gnu-emacs@gnu.org; Wed, 28 Dec 2016 21:37:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cMQaT-0007d7-Mw for bug-gnu-emacs@gnu.org; Wed, 28 Dec 2016 21:37:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: npostavs@users.sourceforge.net Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 29 Dec 2016 02:37:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 25288 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: confirmed Original-Received: via spool by 25288-submit@debbugs.gnu.org id=B25288.148297898429279 (code B ref 25288); Thu, 29 Dec 2016 02:37:01 +0000 Original-Received: (at 25288) by debbugs.gnu.org; 29 Dec 2016 02:36:24 +0000 Original-Received: from localhost ([127.0.0.1]:58404 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMQZr-0007c5-Tx for submit@debbugs.gnu.org; Wed, 28 Dec 2016 21:36:24 -0500 Original-Received: from mail-it0-f67.google.com ([209.85.214.67]:34836) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMQZp-0007bj-P5; Wed, 28 Dec 2016 21:36:22 -0500 Original-Received: by mail-it0-f67.google.com with SMTP id b123so37870610itb.2; Wed, 28 Dec 2016 18:36:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=gwf419gIP/e8HrU7lWYs+bj2Npz4sidkiG/VfwLrUFE=; b=VxF6/wLnBpInMrzBg4iSPnN0LkYQGLKDvBWSqDsKQQCEyDVm7IhRgOcfe0EWvzCA+n EBRwVQh/3JeXtKJCAMNlpBY+qGXmjBGxqjw9gx0Bso5hxVQTTUpSSg9MEt5kn2IGb2iL 1FzV73wh1hr0jfKh27pkVFyBcQK0Trbw2oNHyrN4qiLIIqcT1QfooSzKzlavuXWbWBFN GTYhQBxBbOFk/ZytavsN5AMg3aLogE/JtPzUYwF+lj8uPKfi7DIRn/HH+2F7BJk9VczU qZip9ip8GrcioSuyMwdXEnofuozKV/eemrItIiLa96zLHuEi0FbwG7Fas82ceTjN/OO7 GBkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:references:date :in-reply-to:message-id:user-agent:mime-version; bh=gwf419gIP/e8HrU7lWYs+bj2Npz4sidkiG/VfwLrUFE=; b=oOzgieb5v+BpOmJ1QR2ApnyUHLfGVCtbny21rw5eLcH13oOD1ZSNdsowgGijReKRo6 PCXe3aczS8ZI1wy5GwN/zjTA8dqa04wUrz5UeOLNGIMIPJWogXpZWL1stuT9JQHX5wC9 Z5l9bQSEXaT7SpfsG6eledFyvS4p1L1fUVfn5Xt1+A/0mZGLUqGqD45fiKeBNCeYLm8d mxiloXNsTSgqpQfNc4RFa1hLYgE7APHE0RURC0XZsblA0LJwqSlaIb6ot5Ljd9M3UwyP 2oLKWTZ1YmRc9N4ykXonjAaXrUojjEG3E16no4ExEafSz3DmK5A9M2o0iCXtfjs56B57 wjBg== X-Gm-Message-State: AIkVDXLb6d2HWDekSn8ji2bbskcvOBswoTskIFamKHP7Rkw2CSX3fMDK+0O4uKo8noEp9w== X-Received: by 10.36.26.148 with SMTP id 142mr34330653iti.74.1482978976210; Wed, 28 Dec 2016 18:36:16 -0800 (PST) Original-Received: from zony ([45.2.7.65]) by smtp.googlemail.com with ESMTPSA id j143sm24182204ita.1.2016.12.28.18.36.15 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 28 Dec 2016 18:36:15 -0800 (PST) In-Reply-To: <83h95nvojh.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 28 Dec 2016 21:31:14 +0200") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:127536 Archived-At: --=-=-= Content-Type: text/plain tags 25288 patch quit Eli Zaretskii writes: >> From: npostavs@users.sourceforge.net >> Date: Wed, 28 Dec 2016 14:10:30 -0500 >> Cc: 25288@debbugs.gnu.org >> >> Is there a way to recognize incomplete decoding from lisp? I can't see >> any. > > If you know the encoding of the byte stream (and term.el must, since > it evidently decodes it later on), then you could probably use > char-charset, after decoding: if you get 'eight-bit, then you've got > incomplete byte sequence. But I didn't try that. That should work at least for encodings like utf-8 for which undecoded bytes are not ascii. I guess parsing of escape codes would only work on such encodings anyway, so it should be fine. Patch attached. --=-=-= Content-Type: text/plain Content-Disposition: attachment; filename=v1-0001-Handle-multibyte-chars-spanning-chunks-in-term.el.patch Content-Description: patch >From 6b052065c60406df5b4cd54f698f78594a010922 Mon Sep 17 00:00:00 2001 From: Noam Postavsky Date: Wed, 28 Dec 2016 20:13:20 -0500 Subject: [PATCH v1] Handle multibyte chars spanning chunks in term.el * lisp/term.el (term-terminal-undecoded-bytes): New variable. (term-mode): Make it buffer local. Don't make `term-terminal-parameter' buffer-local twice. (term-emulate-terminal): Check for bytes of incompletely decoded characters, and save them until the next call when they can be fully decoded (Bug#25288). --- lisp/term.el | 39 +++++++++++++++++++++++++++++++-------- 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/lisp/term.el b/lisp/term.el index d3d6390..696e39f 100644 --- a/lisp/term.el +++ b/lisp/term.el @@ -341,6 +341,7 @@ (defconst term-protocol-version "0.96") (eval-when-compile (require 'ange-ftp)) +(eval-when-compile (require 'cl-lib)) (require 'ring) (require 'ehelp) @@ -404,6 +405,7 @@ term-terminal-state (defvar term-kill-echo-list nil "A queue of strings whose echo we want suppressed.") (defvar term-terminal-parameter) +(defvar term-terminal-undecoded-bytes nil) (defvar term-terminal-previous-parameter) (defvar term-current-face 'term) (defvar term-scroll-start 0 "Top-most line (inclusive) of scrolling region.") @@ -1015,7 +1017,6 @@ term-mode ;; These local variables are set to their local values: (make-local-variable 'term-saved-home-marker) - (make-local-variable 'term-terminal-parameter) (make-local-variable 'term-saved-cursor) (make-local-variable 'term-prompt-regexp) (make-local-variable 'term-input-ring-size) @@ -1052,6 +1053,7 @@ term-mode (make-local-variable 'term-ansi-current-invisible) (make-local-variable 'term-terminal-parameter) + (make-local-variable 'term-terminal-undecoded-bytes) (make-local-variable 'term-terminal-previous-parameter) (make-local-variable 'term-terminal-previous-parameter-2) (make-local-variable 'term-terminal-previous-parameter-3) @@ -2748,6 +2750,10 @@ term-emulate-terminal (when term-log-buffer (princ str term-log-buffer)) + (when term-terminal-undecoded-bytes + (setq str (concat term-terminal-undecoded-bytes str)) + (setq str-length (length str)) + (setq term-terminal-undecoded-bytes nil)) (cond ((eq term-terminal-state 4) ;; Have saved pending output. (setq str (concat term-terminal-parameter str)) (setq term-terminal-parameter nil) @@ -2763,13 +2769,6 @@ term-emulate-terminal str i)) (when (not funny) (setq funny str-length)) (cond ((> funny i) - ;; Decode the string before counting - ;; characters, to avoid garbling of certain - ;; multibyte characters (bug#1006). - (setq decoded-substring - (decode-coding-string - (substring str i funny) - locale-coding-system)) (cond ((eq term-terminal-state 1) ;; We are in state 1, we need to wrap ;; around. Go to the beginning of @@ -2778,7 +2777,31 @@ term-emulate-terminal (term-down 1 t) (term-move-columns (- (term-current-column))) (setq term-terminal-state 0))) + ;; Decode the string before counting + ;; characters, to avoid garbling of certain + ;; multibyte characters (bug#1006). + (setq decoded-substring + (decode-coding-string + (substring str i funny) + locale-coding-system)) (setq count (length decoded-substring)) + ;; Check for multibyte characters that ends + ;; before end of string, and save it for + ;; next time. + (when (= funny str-length) + (let ((partial 0)) + (while (eq (char-charset (aref decoded-substring + (- count 1 partial))) + 'eight-bit) + (cl-incf partial)) + (when (> partial 0) + (setq term-terminal-undecoded-bytes + (substring decoded-substring (- partial))) + (setq decoded-substring + (substring decoded-substring 0 (- partial))) + (cl-decf str-length partial) + (cl-decf count partial) + (cl-decf funny partial)))) (setq temp (- (+ (term-horizontal-column) count) term-width)) (cond ((or term-suppress-hard-newline (<= temp 0))) -- 2.9.3 --=-=-=--