From: Stefan Monnier <monnier@iro.umontreal.ca>
To: Chong Yidong <cyd@stupidchicken.com>
Cc: intrigeri@boum.org, emacs-devel@gnu.org,
103@emacsbugs.donarmstrong.com, Kenichi Handa <handa@m17n.org>
Subject: Re: 23.0.60; Segmentation fault loading auto-lang.el
Date: Tue, 08 Apr 2008 21:42:14 -0400 [thread overview]
Message-ID: <jwvbq4jn7es.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <87skxwl29o.fsf@stupidchicken.com> (Chong Yidong's message of "Tue, 08 Apr 2008 12:50:11 -0400")
>>> (let ((str (string-as-unibyte "ä")))
>>> (string-match (char-to-string (string-to-char str)) str))
>>
>>> evaluates to 0 in Emacs 22, and to nil in Emacs 23. It turns out that
>>> this screws up the use of all-completions in regexp-opt-group.
>>
>>> Anyone have any idea what's going on here?
>>
>> (string-as-unibyte "ä") => "\303\244"
>> (string-to-char "\303\244") => 195 (because ?\303 == 195)
>> (char-to-string 195) => "Ã" (because 195==0xC3 U+00C3=='Ã')
>> (string-match "Ã" "ä") => nil (obvious)
>>
>> Any Lisp program that depends on the result of
>> string-as-unibyte (thus Emacs' internal character
>> representation) won't work in Emacs 23.
Notice that the problem is unrelated to string-as-unibyte:
(string-match (char-to-string (string-to-char str)) str)
this should intuitively always return 0. Of course, once you replace
`char-to-string' with just `string', you may be reminded that Emacs-23
introduced `unibyte-string', which leads you to the key, if `str' is
unibyte, you need to do
(string-match (unibyte-string (string-to-char str)) str)
In Emacs-22, `string' used a heuristic to decide whether to build
a unibyte or multibyte string, and more importantly, the character
representing byte code 209 had code 209, whereas in Emacs-23, we have
the strange situation that byte 209 is character 4194257.
So an integer <256 needs to be accompagnied with some contextual info
that says whether it represents a char or a byte, otherwise you get
ambiguity that lead to bugs. And string-to-char returns either a byte
or a char depending on whether the string was unibyte or multibyte.
> I see. However, maybe the following change to regexp-opt-group in
> regexp-opt.el would make things a little more predictable. What do you
> think?
Yes, it looks like a good fix. Maybe "-no-properties" would be even
better.
Stefan
next prev parent reply other threads:[~2008-04-09 1:42 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-08 5:29 23.0.60; Segmentation fault loading auto-lang.el Chong Yidong
2008-04-08 6:52 ` Kenichi Handa
2008-04-08 16:50 ` Chong Yidong
2008-04-09 1:42 ` Stefan Monnier [this message]
2008-04-09 2:19 ` Kenichi Handa
-- strict thread matches above, loose matches on Subject: below --
2008-03-30 18:38 intrigeri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=jwvbq4jn7es.fsf-monnier+emacs@gnu.org \
--to=monnier@iro.umontreal.ca \
--cc=103@emacsbugs.donarmstrong.com \
--cc=cyd@stupidchicken.com \
--cc=emacs-devel@gnu.org \
--cc=handa@m17n.org \
--cc=intrigeri@boum.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.