From: Sebastian Tennant <sebyte@smolny.plus.com>
To: help-gnu-emacs@gnu.org
Subject: Re: mapconcat + format problem
Date: Fri, 08 Jun 2007 13:58:01 +0300 [thread overview]
Message-ID: <87wsyeemrq.fsf@moley.org> (raw)
In-Reply-To: 1181294738.059057.112330@m36g2000hse.googlegroups.com
Quoth Peter Tury <tury.peter@gmail.com>:
> Hi,
>
> I found that
>
> (mapconcat (lambda (n)
> (format "%c" n))
> (number-sequence start end)
> "")
>
> gives different results for charcters after ~160 depending on `start':
> if `start' is 0 then I get "readable" results, but otherwise I get the
> characters' octal code. E.g. after
>
> (defun to-string (start end)
> (mapconcat (lambda (n)
> (format "%c" n))
> (number-sequence start end)
> ""))
>
> (equal (substring (to-string 0 190) 1)
> (substring (to-string 1 190) 0))
>
> results nil. Why?
(set-buffer-multibyte nil)
=> nil
(to-string 0 190)
"...{|}~\x7f\x80\x81..."
(to-string 1 190)
"... {|}~\x7f\200\201..."
It seems to me the output begins to differ after character code 127,
rather than ~ 160:
(format "%c" 127)
=> "^?"
which suggests the issue has something to do with the way mapconcat
is handling ASCII versus non-ASCII characters.
However, this is interesting:
(info "(elisp)Text Representations")
In multibyte representation, a character may occupy more than one
byte, and as a result, the full range of Emacs character codes can be
stored. The first byte of a multibyte character is always in the range
128 through 159 (octal 0200 through 0237). These values are called
"leading codes". The second and subsequent bytes of a multibyte
character are always in the range 160 through 255 (octal 0240 through
0377); these values are "trailing codes".
Some sequences of bytes are not valid in multibyte text: for example,
a single isolated byte in the range 128 through 159 is not allowed. But
character codes 128 through 159 can appear in multibyte text,
represented as two-byte sequences. All the character codes 128 through
255 are possible (though slightly abnormal) in multibyte text; they
appear in multibyte buffers and strings when you do explicit encoding
and decoding (*note Explicit Encoding::).
But this is all about text representation in buffers and won't explain
why your equality test:
(equal (substring (to-string 0 190) 1)
(substring (to-string 1 190) 0))
fails.
Sorry! Not much help :-/
Sebastian
prev parent reply other threads:[~2007-06-08 10:58 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-08 9:25 mapconcat + format problem Peter Tury
2007-06-08 10:58 ` Sebastian Tennant [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wsyeemrq.fsf@moley.org \
--to=sebyte@smolny.plus.com \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.