all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Nathan Trapuzzano <nbtrap@nbtrap.com>
To: "Stephen J. Turnbull" <stephen@xemacs.org>
Cc: Eli Zaretskii <eliz@gnu.org>,
	monnier@IRO.UMontreal.CA, emacs-devel@gnu.org
Subject: Re: Unibyte characters, strings, and buffers
Date: Sat, 29 Mar 2014 13:01:17 -0400	[thread overview]
Message-ID: <87a9c9aqhu.fsf@nbtrap.com> (raw)
In-Reply-To: <87ob0pnyt6.fsf@uwakimon.sk.tsukuba.ac.jp> (Stephen J. Turnbull's message of "Sat, 29 Mar 2014 18:23:17 +0900")

"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> What is relevant is how to represent byte streams in Emacs.  The
> obvious non-unibyte way is a one-to-one mapping of bytes to Unicode
> characters.  It is *extremely* convenient if the first 128 of those
> bytes correspond to the ASCII coded character set, because so many
> wire protocols use ASCII "words" syntactically.  The other 128 don't
> matter much, so why not just use the extremely convenient Latin-1 set
> for them?

Sorry if someone brought this up already, but one reason raw bytes
shouldn't be represented as Latin-1 characters is that the "raw
bytes"-ness would be lost when writing them back to disk if the stream
also contained characters outside the Latin-1 range.

For example, say we decode a stream of raw bytes as utf8, but that the
stream contains some non-utf8 sequences.  IIUC, Emacs will interpret
those as "raw bytes", so that when it goes to encode the string to write
it back, they will be written back verbatim.  Whereas, if they had been
interpreted as Latin-1 characters, they would get written back as the
UTF8 equivalents.  Hence you have the odd situation where you can decode
and then encode and end up with a different string.

Someone brought up Python in another post.  Python (version 3 at least)
does the same thing when, e.g., interpreting filenames.  If you pass a
string (_not_ bytes) to os.listdir, but the contents of the directory
can't all be decoded as utf-8, it will return strings (_not_ bytes)
where the non-utf8 sequences are Python-specific "characters" (in the
Unicode private use areas I believe) representing "raw bytes",
i.e. entities to be written back to the disk as the same raw sequences
that were read therefrom.



  parent reply	other threads:[~2014-03-29 17:01 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-26 19:04 Buffer-local variables affect general-purpose functions Eli Zaretskii
2014-03-26 19:32 ` Paul Eggert
2014-03-26 20:03   ` Eli Zaretskii
2014-03-26 21:50     ` Paul Eggert
2014-03-27 17:42       ` Eli Zaretskii
2014-03-27 18:55         ` Paul Eggert
2014-03-27 14:17 ` Stefan Monnier
2014-03-27 17:17   ` Eli Zaretskii
2014-03-27 21:04     ` Stefan Monnier
2014-03-28  7:11       ` Eli Zaretskii
2014-03-28  7:46         ` Paul Eggert
2014-03-28  8:18           ` Unibyte characters, strings and buffers Eli Zaretskii
2014-03-28 18:42             ` Paul Eggert
2014-03-28 18:52               ` Eli Zaretskii
2014-03-28 19:21                 ` Paul Eggert
2014-03-29  6:40                   ` Eli Zaretskii
2014-03-29 18:57                     ` Paul Eggert
2014-03-29 19:46                       ` Eli Zaretskii
2014-03-28 20:23                 ` Stefan Monnier
2014-03-29 19:34                 ` Stefan Monnier
2014-03-28 14:12         ` Buffer-local variables affect general-purpose functions Stefan Monnier
2014-03-28  3:38     ` Stephen J. Turnbull
2014-03-28  8:51       ` Unibyte characters, strings, and buffers Eli Zaretskii
2014-03-28 10:28         ` Stephen J. Turnbull
2014-03-28 10:58           ` David Kastrup
2014-03-28 11:22             ` Andreas Schwab
2014-03-28 11:34               ` David Kastrup
2014-03-28 11:42             ` Stephen J. Turnbull
2014-03-28 17:29           ` Eli Zaretskii
2014-03-28 17:50             ` David Kastrup
2014-03-28 18:31               ` Eli Zaretskii
2014-03-28 19:25                 ` David Kastrup
2014-03-29  6:43                   ` Eli Zaretskii
2014-03-29  7:23                     ` David Kastrup
2014-03-29  8:24                       ` Eli Zaretskii
2014-03-29  8:40                         ` David Kastrup
2014-03-29  9:25                           ` Eli Zaretskii
2014-03-28 20:27             ` Stefan Monnier
2014-03-29  9:23             ` Stephen J. Turnbull
2014-03-29  9:52               ` Andreas Schwab
2014-03-29 10:48                 ` Eli Zaretskii
2014-03-29 11:00                   ` Andreas Schwab
2014-03-29 11:18                     ` Eli Zaretskii
2014-03-29 11:30                       ` Andreas Schwab
     [not found]                         ` <83ha6hduzz.fsf@gnu.org>
2014-03-29 14:30                           ` Andreas Schwab
2014-03-29 14:47                             ` Eli Zaretskii
2014-03-29 10:42               ` David Kastrup
2014-03-29 11:07                 ` Eli Zaretskii
2014-03-29 11:30                   ` David Kastrup
2014-03-29 12:58                     ` Eli Zaretskii
2014-03-29 13:15                       ` David Kastrup
2014-03-29 10:44               ` Eli Zaretskii
2014-03-29 11:06               ` Andreas Schwab
2014-03-29 11:12                 ` Eli Zaretskii
2014-03-29 16:11                   ` Stephen J. Turnbull
2014-03-29 15:37                 ` Stephen J. Turnbull
2014-03-29 15:55                   ` David Kastrup
2014-03-29 16:28                     ` Stephen J. Turnbull
2014-03-29 17:00                       ` David Kastrup
2014-03-30  2:05                         ` Stephen J. Turnbull
2014-03-30  9:01                           ` David Kastrup
2014-03-30 12:13                             ` Stephen J. Turnbull
2014-03-30 14:25                             ` Andreas Schwab
2014-03-30 15:05                               ` David Kastrup
2014-03-30 15:39                                 ` Andreas Schwab
2014-03-29 17:08                       ` Andreas Schwab
2014-03-30  0:24                     ` Richard Stallman
2014-03-30  3:32                       ` Stefan Monnier
2014-03-30 15:13                         ` Richard Stallman
2014-03-29 15:58                   ` Andreas Schwab
2014-03-29 16:35                     ` Stephen J. Turnbull
2014-03-29 17:06                       ` Andreas Schwab
2014-03-29 17:01               ` Nathan Trapuzzano [this message]
2014-03-29 17:08                 ` Nathan Trapuzzano
2014-03-29 17:18                   ` David Kastrup
2014-03-29 17:33                     ` Nathan Trapuzzano
2014-03-30  0:24                       ` Richard Stallman
2014-03-30  8:38                         ` Andreas Schwab
2014-03-30 15:12                           ` Richard Stallman
2014-03-29 17:16                 ` David Kastrup
2014-03-28 18:45           ` Daniel Colascione
2014-03-28 19:35             ` Glenn Morris
2014-03-29 11:17             ` Stephen J. Turnbull
2014-03-29 11:22               ` Eli Zaretskii
2014-03-29 16:03                 ` Stephen J. Turnbull
2014-03-31 15:22                   ` Eli Zaretskii
2014-04-01  3:36                     ` Stephen J. Turnbull
2014-04-01  7:42                       ` David Kastrup
2014-04-01  9:38                         ` Stephen J. Turnbull
2014-04-01 15:19                         ` Eli Zaretskii
2014-04-01 15:16                       ` Eli Zaretskii
2014-04-02  4:20                         ` Stephen J. Turnbull
2014-04-02 17:06                           ` Eli Zaretskii
2014-04-03 10:59                             ` David Kastrup
2014-04-03 16:07                               ` Eli Zaretskii
2014-04-03 16:26                                 ` David Kastrup
2014-04-03 19:11                                   ` Eli Zaretskii
2014-04-03 20:03                                     ` David Kastrup
2014-04-04  0:48                                       ` Stephen J. Turnbull
2014-04-04  8:08                                         ` Eli Zaretskii
2014-04-04  7:58                                       ` Eli Zaretskii
2014-04-04 11:40                                     ` Richard Stallman
2014-04-03 13:04                             ` Stephen J. Turnbull

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a9c9aqhu.fsf@nbtrap.com \
    --to=nbtrap@nbtrap.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@IRO.UMontreal.CA \
    --cc=stephen@xemacs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.