From: David Kastrup <dak@gnu.org>
To: "Stephen J. Turnbull" <stephen@xemacs.org>
Cc: Eli Zaretskii <eliz@gnu.org>,
monnier@IRO.UMontreal.CA, emacs-devel@gnu.org
Subject: Re: Unibyte characters, strings, and buffers
Date: Sat, 29 Mar 2014 11:42:43 +0100 [thread overview]
Message-ID: <87y4ztp9p8.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <87ob0pnyt6.fsf@uwakimon.sk.tsukuba.ac.jp> (Stephen J. Turnbull's message of "Sat, 29 Mar 2014 18:23:17 +0900")
"Stephen J. Turnbull" <stephen@xemacs.org> writes:
> Eli Zaretskii writes:
>
> > How is it different? What would be the encoding of a buffer that
> > contains raw bytes?
>
> Depends. If it's uninterpreted bytes, "binary." If those are
> undecodable bytes, they'll be the representation of raw bytes that
> occurred in an otherwise sane encoded stream, and the buffer's
> encoding will be the nominal encoding of that stream.
It's worth pointing out that there is no such thing as a "buffer's
encoding" in general in Emacs. Buffers are sequences of characters or,
in the case of a unibyte buffer, bytes. Encodings come into play for
import/export only but they are not an inherent property of the buffer
as such but rather, for example, of the file association of the buffer.
Emacs has two kinds of internal representation (what one might actually
want to call "buffer encoding"): unibyte and multibyte. XEmacs, I
think, has only one.
The current point of contention is about changing the way of
codepoint-based character operations depending on the unibyte state of
the current buffer.
I consider that an astonishingly bad idea since character and string
operations are not tied to a particular buffer. The whole point of MULE
from a rather early point of time on was to deal with only a single
Unicode-based character set in all of Emacs. Making character
operations change meaning based on a buffer's unibyte status means a
return to the character set semantics of Emacs 19.
I am not necessarily of the same opinion as Stephen regarding whether or
not abolishing unibyte buffers is a worthwhile goal. But I am pretty
sure that "unibyte" should not be bleeding over into character and
string operations.
A unibyte buffer or unibyte string might error out when trying to insert
characters out of the range 0..255. That's an obvious consequence of
the buffer's representation.
If we want different semantics for case-fold-search in binary buffers,
then the solution is setting a buffer-local setting of case-fold-search
when opening a buffer intended to be manipulated in a binary way.
But the unibyte setting of the buffer should not affect normal character
and string operation semantics. It is a buffer implementation detail
that should not really have a visible effect apart from making some
buffer operations impossible.
Whether or not we want to abolish unibyte buffer representations, we
don't want this to bleed effects beyond the buffer representation.
If something chooses a unibyte buffer representation for some reason, it
is the responsibility of the same something to switch character
operations and case-fold-search etc to something making sense in the
context of its operation. That may well be through some buffer-local
setting of case-fold-search etc, but it is not tied to the internal
representation of the buffer contents.
--
David Kastrup
next prev parent reply other threads:[~2014-03-29 10:42 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-26 19:04 Buffer-local variables affect general-purpose functions Eli Zaretskii
2014-03-26 19:32 ` Paul Eggert
2014-03-26 20:03 ` Eli Zaretskii
2014-03-26 21:50 ` Paul Eggert
2014-03-27 17:42 ` Eli Zaretskii
2014-03-27 18:55 ` Paul Eggert
2014-03-27 14:17 ` Stefan Monnier
2014-03-27 17:17 ` Eli Zaretskii
2014-03-27 21:04 ` Stefan Monnier
2014-03-28 7:11 ` Eli Zaretskii
2014-03-28 7:46 ` Paul Eggert
2014-03-28 8:18 ` Unibyte characters, strings and buffers Eli Zaretskii
2014-03-28 18:42 ` Paul Eggert
2014-03-28 18:52 ` Eli Zaretskii
2014-03-28 19:21 ` Paul Eggert
2014-03-29 6:40 ` Eli Zaretskii
2014-03-29 18:57 ` Paul Eggert
2014-03-29 19:46 ` Eli Zaretskii
2014-03-28 20:23 ` Stefan Monnier
2014-03-29 19:34 ` Stefan Monnier
2014-03-28 14:12 ` Buffer-local variables affect general-purpose functions Stefan Monnier
2014-03-28 3:38 ` Stephen J. Turnbull
2014-03-28 8:51 ` Unibyte characters, strings, and buffers Eli Zaretskii
2014-03-28 10:28 ` Stephen J. Turnbull
2014-03-28 10:58 ` David Kastrup
2014-03-28 11:22 ` Andreas Schwab
2014-03-28 11:34 ` David Kastrup
2014-03-28 11:42 ` Stephen J. Turnbull
2014-03-28 17:29 ` Eli Zaretskii
2014-03-28 17:50 ` David Kastrup
2014-03-28 18:31 ` Eli Zaretskii
2014-03-28 19:25 ` David Kastrup
2014-03-29 6:43 ` Eli Zaretskii
2014-03-29 7:23 ` David Kastrup
2014-03-29 8:24 ` Eli Zaretskii
2014-03-29 8:40 ` David Kastrup
2014-03-29 9:25 ` Eli Zaretskii
2014-03-28 20:27 ` Stefan Monnier
2014-03-29 9:23 ` Stephen J. Turnbull
2014-03-29 9:52 ` Andreas Schwab
2014-03-29 10:48 ` Eli Zaretskii
2014-03-29 11:00 ` Andreas Schwab
2014-03-29 11:18 ` Eli Zaretskii
2014-03-29 11:30 ` Andreas Schwab
[not found] ` <83ha6hduzz.fsf@gnu.org>
2014-03-29 14:30 ` Andreas Schwab
2014-03-29 14:47 ` Eli Zaretskii
2014-03-29 10:42 ` David Kastrup [this message]
2014-03-29 11:07 ` Eli Zaretskii
2014-03-29 11:30 ` David Kastrup
2014-03-29 12:58 ` Eli Zaretskii
2014-03-29 13:15 ` David Kastrup
2014-03-29 10:44 ` Eli Zaretskii
2014-03-29 11:06 ` Andreas Schwab
2014-03-29 11:12 ` Eli Zaretskii
2014-03-29 16:11 ` Stephen J. Turnbull
2014-03-29 15:37 ` Stephen J. Turnbull
2014-03-29 15:55 ` David Kastrup
2014-03-29 16:28 ` Stephen J. Turnbull
2014-03-29 17:00 ` David Kastrup
2014-03-30 2:05 ` Stephen J. Turnbull
2014-03-30 9:01 ` David Kastrup
2014-03-30 12:13 ` Stephen J. Turnbull
2014-03-30 14:25 ` Andreas Schwab
2014-03-30 15:05 ` David Kastrup
2014-03-30 15:39 ` Andreas Schwab
2014-03-29 17:08 ` Andreas Schwab
2014-03-30 0:24 ` Richard Stallman
2014-03-30 3:32 ` Stefan Monnier
2014-03-30 15:13 ` Richard Stallman
2014-03-29 15:58 ` Andreas Schwab
2014-03-29 16:35 ` Stephen J. Turnbull
2014-03-29 17:06 ` Andreas Schwab
2014-03-29 17:01 ` Nathan Trapuzzano
2014-03-29 17:08 ` Nathan Trapuzzano
2014-03-29 17:18 ` David Kastrup
2014-03-29 17:33 ` Nathan Trapuzzano
2014-03-30 0:24 ` Richard Stallman
2014-03-30 8:38 ` Andreas Schwab
2014-03-30 15:12 ` Richard Stallman
2014-03-29 17:16 ` David Kastrup
2014-03-28 18:45 ` Daniel Colascione
2014-03-28 19:35 ` Glenn Morris
2014-03-29 11:17 ` Stephen J. Turnbull
2014-03-29 11:22 ` Eli Zaretskii
2014-03-29 16:03 ` Stephen J. Turnbull
2014-03-31 15:22 ` Eli Zaretskii
2014-04-01 3:36 ` Stephen J. Turnbull
2014-04-01 7:42 ` David Kastrup
2014-04-01 9:38 ` Stephen J. Turnbull
2014-04-01 15:19 ` Eli Zaretskii
2014-04-01 15:16 ` Eli Zaretskii
2014-04-02 4:20 ` Stephen J. Turnbull
2014-04-02 17:06 ` Eli Zaretskii
2014-04-03 10:59 ` David Kastrup
2014-04-03 16:07 ` Eli Zaretskii
2014-04-03 16:26 ` David Kastrup
2014-04-03 19:11 ` Eli Zaretskii
2014-04-03 20:03 ` David Kastrup
2014-04-04 0:48 ` Stephen J. Turnbull
2014-04-04 8:08 ` Eli Zaretskii
2014-04-04 7:58 ` Eli Zaretskii
2014-04-04 11:40 ` Richard Stallman
2014-04-03 13:04 ` Stephen J. Turnbull
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y4ztp9p8.fsf@fencepost.gnu.org \
--to=dak@gnu.org \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
--cc=monnier@IRO.UMontreal.CA \
--cc=stephen@xemacs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).