From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#23086: 25.1.50; Emacs ignores Unicode line and paragraph separator characters Date: Tue, 22 Mar 2016 18:13:15 +0200 Message-ID: <831t725w4k.fsf@gnu.org> References: Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1458663276 14088 80.91.229.3 (22 Mar 2016 16:14:36 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 22 Mar 2016 16:14:36 +0000 (UTC) Cc: 23086@debbugs.gnu.org To: Philipp Stephani Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Mar 22 17:14:20 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aiOwh-000077-MK for geb-bug-gnu-emacs@m.gmane.org; Tue, 22 Mar 2016 17:14:15 +0100 Original-Received: from localhost ([::1]:38226 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aiOwh-0000w4-3I for geb-bug-gnu-emacs@m.gmane.org; Tue, 22 Mar 2016 12:14:15 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37336) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aiOwX-0000v4-T1 for bug-gnu-emacs@gnu.org; Tue, 22 Mar 2016 12:14:11 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aiOwT-0007D2-NT for bug-gnu-emacs@gnu.org; Tue, 22 Mar 2016 12:14:05 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:34831) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aiOwT-0007Cx-Jf for bug-gnu-emacs@gnu.org; Tue, 22 Mar 2016 12:14:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1aiOwT-00009U-GG for bug-gnu-emacs@gnu.org; Tue, 22 Mar 2016 12:14:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 22 Mar 2016 16:14:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 23086 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 23086-submit@debbugs.gnu.org id=B23086.1458663226556 (code B ref 23086); Tue, 22 Mar 2016 16:14:01 +0000 Original-Received: (at 23086) by debbugs.gnu.org; 22 Mar 2016 16:13:46 +0000 Original-Received: from localhost ([127.0.0.1]:60191 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aiOwD-00008u-JL for submit@debbugs.gnu.org; Tue, 22 Mar 2016 12:13:45 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:46270) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aiOwB-00008h-LE for 23086@debbugs.gnu.org; Tue, 22 Mar 2016 12:13:44 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aiOw2-00078b-FX for 23086@debbugs.gnu.org; Tue, 22 Mar 2016 12:13:38 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:58336) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aiOw2-00078W-Bv; Tue, 22 Mar 2016 12:13:34 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4094 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1aiOw1-0008Hb-OP; Tue, 22 Mar 2016 12:13:34 -0400 In-reply-to: (message from Philipp Stephani on Tue, 22 Mar 2016 11:42:46 +0100) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:115331 Archived-At: > From: Philipp Stephani > Date: Tue, 22 Mar 2016 11:42:46 +0100 > > Type some characters > C-x 8 RET LINE SEPARATOR (or PARAGRAPH SEPARATOR) > Type some more characters > M-q > > Expected behavior: Emacs treats these characters as line and paragraph > separators: they are displayed as line breaks, M-q doesn't remove them, > and forward-paragraph etc. treat the paragraph separator as paragraph > end. > > Actual behavior: These characters are displayed as one-pixel horizontal > whitespace and otherwise ignore. > > Also discussed in > https://lists.gnu.org/archive/html/emacs-devel/2015-08/msg01043.html. > https://www.emacswiki.org/emacs/unicode-whitespace.el supposedly adds > support for these characters, but I think proper treatment of Unicode > separators should be part of Emacs. It is not clear to me what exactly is the requested feature. Can you propose a detailed list of requirements? I'm asking because these characters come in Unicode with a non-trivial baggage, that is a far cry from just breaking the line; see http://unicode.org/reports/tr14/ http://unicode.org/reports/tr29/ There are also implications on the bidirectional display (it is sensitive to where the line and the paragraph begin and end). If we want to support these two characters, we should think about which parts of the relevant functionality we want to see in Emacs, because users will expect that. In addition, there are other white-space characters defined by Unicode, and it would make sense to treat them all alike. I'm not sure it makes sense to support just the line-breaking and paragraph-separator parts of only these two characters. Then there are Emacs-specific issues, for example: . do we treat u+2028 and u+2029 as literal characters, or as a form of EOL encoding? . if the former, how do we distinguish them from newlines on display? . should Isearch find these when looking for "\n"? how about regexp search for "$"? There are probably more implications, these just the ones that popped in my mind in 5 sec. IOW, I think Someoneā„¢ should think this over and present a detailed proposal. Thanks.