From: Eli Zaretskii <eliz@gnu.org>
To: Drew Adams <drew.adams@oracle.com>
Cc: cyd@gnu.org, 12054@debbugs.gnu.org
Subject: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display
Date: Sat, 03 Nov 2012 23:13:40 +0200 [thread overview]
Message-ID: <83pq3u4cfv.fsf@gnu.org> (raw)
In-Reply-To: <0B444DBDD1D14FD7B5EDE10E30ED320D@us.oracle.com>
> From: "Drew Adams" <drew.adams@oracle.com>
> Date: Sat, 3 Nov 2012 12:01:29 -0700
> Cc: 12054@debbugs.gnu.org
>
> I think I understand this (but I might be misunderstanding). The \240 in the
> 4-char ASCII regexp string "\240" is interpreted (read?) as a raw byte, not as
> the char I wanted.
Yes.
> That is, the literal string in my code is read as a string that contains only a
> single raw byte of octal 240 in place of the 4 chars \240 (and instead of as a
> string with the multibyte char no-break space). Is that right?
Yes.
> And putting that together with Eli's statement about insertion ("'insert' treats
> strings such as "\nnn" as unibyte strings"), I understand that the buffer text
> after I type `C-q 240' contains a unibyte raw byte, and not the multibyte char
> no-break space.
No. It contains the NBSP. Try it. C-q inserts a multibyte
character, unlike '(insert "\240")', for example.
> But in that case I do not understand why `C-u C-x =' says that it _is_ the
> Unicode no-break space char.
Because it is.
> And I do not understand why Yidong's font-lock correction also shows
> that it is a no-break space char.
Chong didn't use "\240".
> So I'm confused about what is actually in the buffer. From the doc and from
> Eli's statement, I gather that there is a unibyte raw byte (octal 240) at that
> position. But `C-u C-x =' and font-lock seem to tell me that there is a
> (multibyte) no-break space char there.
Try '(insert "\240")' and then "C-x =" will show a unibyte byte.
> > (One reason for doing this is to allow unibyte strings to
> > be specified using string constants in Emacs Lisp source code.)
>
> I can see how that can be useful. But I can also see how it would be useful to
> have some way of using octal syntax to match multibyte chars. Isn't there some
> reasonable way to allow for both?
Maybe, but we didn't find one, at least not one that would be
backward-compatible.
> Is there, for example, (or could there be added) a function that one can apply
> to the unibyte string for \240 that would convert it to a string that DTRT wrt
> multibyte?
Such functions do exist, see the "Converting Representations" node in
the ELisp manual.
> (decode-coding-string "\302\240" 'utf-8)
>
> That allows use of only octal syntax - good. But it still doesn't solve the
> problem for older Emacs versions - they raise the error (coding-system-error
> utf-8).
You don't want this, because even if you succeed in producing a NBSP
in Emacs 22 and older, the result will not match NBSP in other
charsets. It's simply impossible with those versions of Emacs.
next prev parent reply other threads:[~2012-11-03 21:13 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-26 5:43 bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Drew Adams
2012-09-16 23:40 ` Drew Adams
2012-11-03 10:50 ` Chong Yidong
2012-11-03 11:03 ` Chong Yidong
2012-11-03 16:25 ` Drew Adams
2012-11-03 16:56 ` Eli Zaretskii
2012-11-03 17:22 ` Drew Adams
2012-11-03 20:57 ` Eli Zaretskii
2012-11-03 19:50 ` Stefan Monnier
2012-11-03 20:02 ` Drew Adams
2012-11-03 20:36 ` Stefan Monnier
2012-11-03 20:42 ` Drew Adams
2012-11-03 17:06 ` Chong Yidong
2012-11-03 17:32 ` Drew Adams
2012-11-03 18:00 ` Chong Yidong
2012-11-03 18:04 ` Drew Adams
2012-11-03 21:00 ` Eli Zaretskii
2012-11-03 19:01 ` Drew Adams
2012-11-03 21:13 ` Eli Zaretskii [this message]
2012-11-04 23:34 ` Drew Adams
2012-11-03 16:37 ` Andreas Schwab
2012-11-03 17:05 ` Drew Adams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83pq3u4cfv.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=12054@debbugs.gnu.org \
--cc=cyd@gnu.org \
--cc=drew.adams@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.