unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: mituharu@math.s.chiba-u.ac.jp, emacs-devel@gnu.org
Subject: Re: [davidsmith@acm.org: [patch] url-hexify-string does not	follow	W3C spec]
Date: Tue, 01 Aug 2006 10:32:05 -0400	[thread overview]
Message-ID: <jwvy7u8h6yz.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <E1G7oSU-0005rn-00@etlken> (Kenichi Handa's message of "Tue, 01 Aug 2006 16:14:30 +0900")

>>>> What incompatibility?  If the string only contains ASCII and
>>>> eight-bit-*, then encoding it with utf-8 will return the same string
>>>> of bytes (except in a unibyte string rather than multibyte string).

>>> Here's an example:

>>> (encode-coding-string "\x80" 'utf-8)
>>> => "\302\200"

>> Duh!  Looks like a serious bug to me.
>> Handa-san, what's up with that?

> ??? \x80 == U+0080 is a valid Unicode character in "C1
> Controls" block.

Why was it chosen to represent U+0080 with \x80?
The problem with it is that it makes it impossible to reliably carry
byte-streams embedded in multibyte strings.  Oh well, I guess that ecbdic
and friends also make it impossible anyway :-(

> However, I agree that the following is very questionable
> behaviour:

>>> (encode-coding-string (string-as-unibyte "\x80") 'utf-8)
>>> => "\302\200"

> But, that is a long standing problem, and should be fixed
> (if necessary) after the release.

It should be fixed by signalling an error: if the string is unibyte it's
already encoded.


        Stefan

  reply	other threads:[~2006-08-01 14:32 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-30 13:14 [davidsmith@acm.org: [patch] url-hexify-string does not follow W3C spec] Richard Stallman
2006-07-30 20:24 ` Thien-Thi Nguyen
2006-07-31  0:59   ` YAMAMOTO Mitsuharu
2006-07-31 10:13     ` Thien-Thi Nguyen
2006-07-31 10:46       ` Jason Rumney
2006-07-31 16:08         ` Stefan Monnier
2006-07-31 16:35           ` David Smith
2006-07-31 20:49             ` Thien-Thi Nguyen
2006-08-01  3:55               ` YAMAMOTO Mitsuharu
2006-08-01  4:20                 ` Stefan Monnier
2006-08-01  4:34                   ` YAMAMOTO Mitsuharu
2006-08-01  6:50                     ` Stefan Monnier
2006-08-01  7:14                       ` Kenichi Handa
2006-08-01 14:32                         ` Stefan Monnier [this message]
2006-08-01  8:42                       ` Jason Rumney
2006-08-01 14:47                 ` Thien-Thi Nguyen
2006-08-01 15:10                   ` Stefan Monnier
2006-08-01 15:14                     ` David Kastrup
2006-08-01 15:54                       ` Stefan Monnier
2006-08-01 16:07                         ` David Kastrup
2006-08-09  3:48                       ` Kenichi Handa
2006-08-02  2:06                   ` YAMAMOTO Mitsuharu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvy7u8h6yz.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=emacs-devel@gnu.org \
    --cc=mituharu@math.s.chiba-u.ac.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).