From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Lars Ingebrigtsen Newsgroups: gmane.emacs.bugs Subject: bug#17758: 24.3; The line isn't wrapped during printing if it contains non ascii chars Date: Sun, 30 May 2021 06:04:28 +0200 Message-ID: <87czt87tpv.fsf@gnus.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="24364"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: Kenichi Handa , 17758@debbugs.gnu.org, tobias.bora@gmail.com To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun May 30 06:05:23 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lnChX-00064G-3i for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 30 May 2021 06:05:23 +0200 Original-Received: from localhost ([::1]:34018 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lnChW-0001Mv-5r for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 30 May 2021 00:05:22 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:41622) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lnChE-0001MU-Uc for bug-gnu-emacs@gnu.org; Sun, 30 May 2021 00:05:04 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:46489) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lnChB-00036N-Vl for bug-gnu-emacs@gnu.org; Sun, 30 May 2021 00:05:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1lnChB-0007Gl-Qp for bug-gnu-emacs@gnu.org; Sun, 30 May 2021 00:05:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Lars Ingebrigtsen Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 30 May 2021 04:05:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 17758 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: fixed Original-Received: via spool by 17758-submit@debbugs.gnu.org id=B17758.162234748327915 (code B ref 17758); Sun, 30 May 2021 04:05:01 +0000 Original-Received: (at 17758) by debbugs.gnu.org; 30 May 2021 04:04:43 +0000 Original-Received: from localhost ([127.0.0.1]:58035 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lnCgt-0007GB-7C for submit@debbugs.gnu.org; Sun, 30 May 2021 00:04:43 -0400 Original-Received: from quimby.gnus.org ([95.216.78.240]:44358) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lnCgp-0007Ft-Eu for 17758@debbugs.gnu.org; Sun, 30 May 2021 00:04:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID :In-Reply-To:Date:References:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=MfYm5NDrdbxI9c77koN6zJi5DzHOUOoHWGO31BCKgSk=; b=GXpaKshfF2zSJJBsZcErTD1GS4 xhmBG3jEIJPo1nxfja/fleB2BoKaBlcJWviYFgKkryaevq/oms6XBfRTOpcQuRXznwU3XwrE6GrQE gWIEhI2d/qU6XztmTyNZQ6o3pyd0f/1tar6GDIQt3ejKWxzrokfgIMZ8PlEj2DXWp174=; Original-Received: from cm-84.212.220.105.getinternet.no ([84.212.220.105] helo=xo) by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lnCgf-0005Uw-Ar; Sun, 30 May 2021 06:04:32 +0200 Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAABGdBTUEAALGPC/xhBQAAACBj SFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAAElBMVEVNHBlwSymUXCQz GRLAk0r///+AteMNAAAAAWJLR0QF+G/pxwAAAAd0SU1FB+UFHgM5NN8NSBsAAAGVSURBVDjLrZTb dcMwCEBBLCAkD2DTDIDDBHG1/0wFO/IjTT56TvlwZF/xhgD8SfL65I88jQRQcAQhQJz0ADzJWERy EkkgfujAzyOWFaY8+cvJ3Pjm5IJ7JPjifhoxM/AI5QWIsMjk5svVwymrfxF5Y34NdLm1K8DfGrk/ 7c09FwsBrzCQ2VHbeDuL4gXoDg4f272up6BXYNSBdfD8Ej/wdHHSCKn6FRoD5F0jbgKS1lAb1sg3 jeJMGw/tYYaN6x6vuM4s0lozGlqqW5ZItni4cymt3SpJS71lNi92r6kMt1zyzLspsiI1kcrAdP/W oY1T1wAlJsOSLX0RC49HtZQ55lbnGhXpIJLDoIiYATnvGl4Ot2bRI87MPSiESH1to49jLvhMA8LK XLObosJAiD3xrb5Y3Qcvrox0BqA16AqO79HXFcyPML3Pw9aNyXulD4/uGKbnEKivZ3a6Aw9ntWWl dNA7qOF+Jl+sHM1HmQ6gkMDurO4YpT2OKfF/ADKOSH2vb5epZ2RmzH6/LS8bEgX04n7Yny4/6PdH 7AKsn3wAAAAldEVYdGRhdGU6Y3JlYXRlADIwMjEtMDUtMzBUMDM6NTc6NTIrMDA6MDCHmhC0AAAA JXRFWHRkYXRlOm1vZGlmeQAyMDIxLTA1LTMwVDAzOjU3OjUyKzAwOjAw9seoCAAAAABJRU5ErkJg gg== X-Now-Playing: Crosby, Stills, Nash & Young's =?UTF-8?Q?=5FD=C3=A9j=C3=A0?= Vu_: "Carry On" In-Reply-To: ("Mattias =?UTF-8?Q?Engdeg=C3=A5rd?="'s message of "Sat, 29 May 2021 10:09:27 +0200") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:207565 Archived-At: Mattias Engdeg=C3=A5rd writes: > Yes, this is by design. The former is the range U+0000..U+00FF and > covers Latin-1. The latter covers ASCII and raw bytes, but no > non-ASCII Unicode characters. > > It is what we get when one interval endpoint is ASCII and the other a > raw byte; it is mainly intended for regexp matching in unibyte > text. Since the Emacs multibyte encoding puts the raw bytes 80..FF at > the very end of the character numbering, [\000-\377] would include > every Unicode character (which would be very surprising) if we didn't > have this mechanism. It makes sense, but when playing with this, I looked at the "range" item in the "Regexp Special" node in the elisp manual, and it doesn't look like this quirk is documented? (But it's a long node, perhaps I missed something when skimming it.) And this bit seems to actively discourage the change I did here: -- Although a range's bound can be almost any character, it is better style to stay within natural sequences of ASCII letters and digits because most people have not memorized character code tables. For example, @samp{[.-9]} is less clear than @samp{[./0-9]}, and @samp{[`-~]} is less clear than @samp{[`a-z@{|@}~]}. Unicode character escapes can help here; for example, for most programmers @samp{[=E0=B8=81-=E0=B8=BA=E0=B8=BF-=E0=B9=9B]} is less clear than @samp{[\= u0E01-\u0E3A\u0E3F-\u0E5B]}. -- So I think a couple of lines mentioning raw bytes vs. Latin-1 would he helpful here. --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no