From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Date: Sat, 03 Nov 2012 23:13:40 +0200 Message-ID: <83pq3u4cfv.fsf@gnu.org> References: <87mwyzyn76.fsf@gnu.org> <45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> <87lieimx9n.fsf@gnu.org> <0B444DBDD1D14FD7B5EDE10E30ED320D@us.oracle.com> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1351977252 15264 80.91.229.3 (3 Nov 2012 21:14:12 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 3 Nov 2012 21:14:12 +0000 (UTC) Cc: cyd@gnu.org, 12054@debbugs.gnu.org To: Drew Adams Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Nov 03 22:14:20 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TUl2p-0005ie-Cu for geb-bug-gnu-emacs@m.gmane.org; Sat, 03 Nov 2012 22:14:19 +0100 Original-Received: from localhost ([::1]:58119 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TUl2g-0000TO-LC for geb-bug-gnu-emacs@m.gmane.org; Sat, 03 Nov 2012 17:14:10 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:55809) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TUl2d-0000T7-UQ for bug-gnu-emacs@gnu.org; Sat, 03 Nov 2012 17:14:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TUl2c-00026E-Th for bug-gnu-emacs@gnu.org; Sat, 03 Nov 2012 17:14:07 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:37386) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TUl2c-00026A-Qt for bug-gnu-emacs@gnu.org; Sat, 03 Nov 2012 17:14:06 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1TUl5S-0004ju-9I for bug-gnu-emacs@gnu.org; Sat, 03 Nov 2012 17:17:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 03 Nov 2012 21:17:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 12054 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 12054-submit@debbugs.gnu.org id=B12054.135197741118194 (code B ref 12054); Sat, 03 Nov 2012 21:17:02 +0000 Original-Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 21:16:51 +0000 Original-Received: from localhost ([127.0.0.1]:47637 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUl5G-0004jP-Ja for submit@debbugs.gnu.org; Sat, 03 Nov 2012 17:16:50 -0400 Original-Received: from mtaout21.012.net.il ([80.179.55.169]:50357) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUl5E-0004jF-73 for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 17:16:49 -0400 Original-Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0MCX00G00K4DX400@a-mtaout21.012.net.il> for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 23:13:50 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MCX00GWVKB2VI50@a-mtaout21.012.net.il>; Sat, 03 Nov 2012 23:13:50 +0200 (IST) In-reply-to: <0B444DBDD1D14FD7B5EDE10E30ED320D@us.oracle.com> X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:66415 Archived-At: > From: "Drew Adams" > Date: Sat, 3 Nov 2012 12:01:29 -0700 > Cc: 12054@debbugs.gnu.org > > I think I understand this (but I might be misunderstanding). The \240 in the > 4-char ASCII regexp string "\240" is interpreted (read?) as a raw byte, not as > the char I wanted. Yes. > That is, the literal string in my code is read as a string that contains only a > single raw byte of octal 240 in place of the 4 chars \240 (and instead of as a > string with the multibyte char no-break space). Is that right? Yes. > And putting that together with Eli's statement about insertion ("'insert' treats > strings such as "\nnn" as unibyte strings"), I understand that the buffer text > after I type `C-q 240' contains a unibyte raw byte, and not the multibyte char > no-break space. No. It contains the NBSP. Try it. C-q inserts a multibyte character, unlike '(insert "\240")', for example. > But in that case I do not understand why `C-u C-x =' says that it _is_ the > Unicode no-break space char. Because it is. > And I do not understand why Yidong's font-lock correction also shows > that it is a no-break space char. Chong didn't use "\240". > So I'm confused about what is actually in the buffer. From the doc and from > Eli's statement, I gather that there is a unibyte raw byte (octal 240) at that > position. But `C-u C-x =' and font-lock seem to tell me that there is a > (multibyte) no-break space char there. Try '(insert "\240")' and then "C-x =" will show a unibyte byte. > > (One reason for doing this is to allow unibyte strings to > > be specified using string constants in Emacs Lisp source code.) > > I can see how that can be useful. But I can also see how it would be useful to > have some way of using octal syntax to match multibyte chars. Isn't there some > reasonable way to allow for both? Maybe, but we didn't find one, at least not one that would be backward-compatible. > Is there, for example, (or could there be added) a function that one can apply > to the unibyte string for \240 that would convert it to a string that DTRT wrt > multibyte? Such functions do exist, see the "Converting Representations" node in the ELisp manual. > (decode-coding-string "\302\240" 'utf-8) > > That allows use of only octal syntax - good. But it still doesn't solve the > problem for older Emacs versions - they raise the error (coding-system-error > utf-8). You don't want this, because even if you succeed in producing a NBSP in Emacs 22 and older, the result will not match NBSP in other charsets. It's simply impossible with those versions of Emacs.