unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Toke Høiland-Jørgensen" <toke@toke.dk>
To: Stefan Monnier <monnier@IRO.UMontreal.CA>
Cc: Eli Zaretskii <eliz@gnu.org>, Alain Schneble <a.s@realize.ch>,
	dgutov@yandex.ru, emacs-devel@gnu.org
Subject: Re: distinguishing multibyte/unibyte ASCII
Date: Fri, 09 Sep 2016 22:17:58 +0200	[thread overview]
Message-ID: <87fup87rpl.fsf@toke.dk> (raw)
In-Reply-To: <jwvd1kc7t4v.fsf-monnier+Inbox@gnu.org> (Stefan Monnier's message of "Fri, 09 Sep 2016 16:01:57 -0400")

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>> If you just generate an ASCII string from ASCII characters, it will
>> usually be unibyte.  If you take it as a substring from a multibyte
>> buffer, it will usually be multibyte.
>
> And it's arguably a wart in Emacs's handling of chars-vs-bytes.
> But it's kind of hard to fix now.
>
> At some point I tried to change this handling (not exactly fix it) by
> treating multibyte ASCII strings specially (it's easy to recognize by
> checking that the char length is equal to the byte length and both are
> readily available in the "struct Lisp_String" object).  Then when we
> read an ASCII string, instead of making it unibyte, I'd keep it as
> multibyte.  And then change things like "concat" so that those "ASCII
> multibyte" strings don't force the result to be multibyte.
>
> My local Emacs still runs with those changes, but in the end I don't
> think the result is really better (or sufficiently better to justify
> the subtle incompatibilities it introduces).
>
> [ Also, I wouldn't be surprised to hear that such a change causes real
>   problems with utf-7 or EBCDIC, or other systems where decoding/encoding
>   a string of bytes/chars all <127 is not a no-op.  ]

Isn't Unicode fun? :)

-Toke



  reply	other threads:[~2016-09-09 20:17 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-07 15:30 [PATCH] url: Wrap cookie headers in url-http--encode-string Toke Høiland-Jørgensen
2016-09-07 16:40 ` Stefan Monnier
2016-09-07 16:52   ` Toke Høiland-Jørgensen
2016-09-07 17:15     ` Eli Zaretskii
2016-09-07 18:25       ` Toke Høiland-Jørgensen
2016-09-08 14:06         ` Dmitry Gutov
2016-09-08 14:14           ` Toke Høiland-Jørgensen
2016-09-08 14:25             ` Dmitry Gutov
2016-09-08 15:58               ` Toke Høiland-Jørgensen
2016-09-08 17:20                 ` Eli Zaretskii
2016-09-08 17:43                   ` Toke Høiland-Jørgensen
2016-09-08 18:01                     ` Eli Zaretskii
2016-09-08 17:47                   ` Stefan Monnier
2016-09-08 18:04                     ` Eli Zaretskii
2016-09-08 20:29                       ` Alain Schneble
2016-09-09  7:57                         ` Eli Zaretskii
2016-09-09 14:56                 ` Alain Schneble
2016-09-09 15:04                   ` Eli Zaretskii
2016-09-09 15:16                     ` Alain Schneble
2016-09-09 15:06                   ` Stefan Monnier
2016-09-09 15:15                     ` Alain Schneble
2016-09-09 18:02                 ` Alain Schneble
2016-09-09 18:07                   ` Toke Høiland-Jørgensen
2016-09-09 18:54                   ` Eli Zaretskii
2016-09-09 19:21                     ` Alain Schneble
2016-09-09 19:32                       ` Eli Zaretskii
2016-09-09 19:47                         ` Alain Schneble
2016-09-09 19:49                           ` Eli Zaretskii
2016-09-09 19:56                             ` Toke Høiland-Jørgensen
2016-09-10  5:42                               ` Eli Zaretskii
2016-09-10  8:34                                 ` Dmitry Gutov
2016-09-10 19:12                                   ` Eli Zaretskii
2016-09-09 20:01                         ` distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie headers in url-http--encode-string.) Stefan Monnier
2016-09-09 20:17                           ` Toke Høiland-Jørgensen [this message]
2016-09-09 20:46                             ` distinguishing multibyte/unibyte ASCII Stefan Monnier
2016-09-09 21:02                           ` Alain Schneble
2016-09-10  5:50                           ` distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie headers in url-http--encode-string.) Eli Zaretskii
2016-09-07 19:14 ` [PATCH] url: Wrap cookie headers in url-http--encode-string Lars Ingebrigtsen
2016-09-07 20:49   ` Toke Høiland-Jørgensen
2016-09-08  2:47   ` Eli Zaretskii
2016-09-08  9:07     ` Lars Ingebrigtsen
2016-09-08 17:23       ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fup87rpl.fsf@toke.dk \
    --to=toke@toke.dk \
    --cc=a.s@realize.ch \
    --cc=dgutov@yandex.ru \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@IRO.UMontreal.CA \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).