From: Kenichi Handa <handa@m17n.org>
Cc: jas@extundo.com, emacs-devel@gnu.org
Subject: Re: eight-bit char handling in emacs-unicode
Date: Wed, 26 Nov 2003 09:07:47 +0900 (JST) [thread overview]
Message-ID: <200311260007.JAA26617@etlken.m17n.org> (raw)
In-Reply-To: <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca> (message from Stefan Monnier on 25 Nov 2003 10:43:05 -0500)
In article <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
>> It seems that you keep of saying that "A does B, thus it's
>> nonsense". But, I'm arguing that "A does C".
> Well, the thing is: I still don't understand what is C.
> From what I understand, you say that C is "a conversion from multibyte
> to a sequence of code-points",
Yes, that what I said.
> but since the output is a unibyte string,
> that restrict it to cases where the code-points can be encoded in 8 bits,
> thus it doesn't sound very generic
Yes. But I thought generic or not is not a point here.
> and I don't see any application for it
> (nor do I see any practical difference with using encode-coding-string
> since the output AFAIK would be the same).
My examples shows that we can't use encode-coding-string.
How can we use encode-coding-string without knowing what
coding system to use? I haven't heard your answer yet.
>> It doesn't make sense because you treat the result as "a
>> unibyte string encoded in Latin-1".
>> It makes sense if you treat the result as "a unibyte string
>> in which each byte represents a sequence of Unicode
>> code-points", doesn't it?
> But each byte can only represent the 0-255 subset of unicode code-points, in
> which case this is equivalent (practically speaking) to latin-1, isn't it ?
Yes. And that covers all characters the user uses in this
case.
>>> It'd make sense if the environment said "latin-1 when you can,
>>> utf-8 otherwise" or something like that, but then we would use
>>> encode-coding-string anyway.
>> It's itself nonsense to have such a coding system.
> I was not thinking of a coding-system, but just some encoding job,
> such as what is done when saving a buffer (where my .emacs does exactly
> that: try latin-1 first and utf-8 if that fails).
Ah, I see. But, my understanding is that
string-make-unibyte/multibyte are designed not to change the
number of characters to make the difference of
unibyte/multibyte transparent in Lisp. That restriction
leads to a case that non-supported characters are handled
incorrectly. But, I think Richard's design policy was that
incorrect handling of non-supported characters is better
than a possibly more disastrous error caused by the change
of number of characters.
>> Do you agree with having string-make-unibyte if it signals an error on
>> non-Latin-1 characters?
> Of course: that's pretty much what I suggested: make-string-unibyte only
> accepts multibyte chars that correspond to "bytes".
I agree with that. But, it just changes the behaviour of
the function on error case. It doesn't change the concept
of what it does.
>>> I just don't know of a concrete case where it makes sense to use
>>> string-make-unibyte.
>> I'll paraphrase my previous example as this:
>> It is perfectly possible to live in such an environment
>> where only the characters U+0000..U+00FF of Unicode is
>> used but only the coding system utf-8 is used.
>> But, I don't claim that the above is a realistic case.
>> Another non-realistic but concrete case is:
>> Use only the charset iso-8859-5 and the encoding CTEXT.
> I don't see any use of string-make-unibyte in your two examples.
Again, I'd like to ask how to use encode-coding-string
without knowing the proper coding-system in each case.
> And "having string-make-unibyte if it signals an error on non-Latin-1
> characters" means that the second example can't be used any more.
In the second case, of course "supported characters" are
what included in the charset iso-8859-5, and
string-make-unibyte should accept them. Again, the result
is the same as encoding by the coding system iso-8859-5, but
we only know about the coding system CTEXT here.
---
Ken'ichi HANDA
handa@m17n.org
next prev parent reply other threads:[~2003-11-26 0:07 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-11-12 16:11 BIG5-HKSCS? Simon Josefsson
2003-11-13 1:53 ` BIG5-HKSCS? Kenichi Handa
2003-11-13 4:14 ` BIG5-HKSCS? Simon Josefsson
2003-11-13 5:34 ` BIG5-HKSCS? Kenichi Handa
2003-11-13 5:50 ` BIG5-HKSCS? Simon Josefsson
2003-11-13 4:49 ` BIG5-HKSCS? Simon Josefsson
2003-11-13 6:10 ` BIG5-HKSCS? Kenichi Handa
2003-11-13 6:51 ` BIG5-HKSCS? Simon Josefsson
2003-11-13 9:01 ` BIG5-HKSCS? Kenichi Handa
2003-11-13 13:29 ` BIG5-HKSCS? Oliver Scholz
2003-11-13 23:40 ` BIG5-HKSCS? Kenichi Handa
2003-11-14 13:35 ` BIG5-HKSCS? Oliver Scholz
2003-11-13 16:34 ` BIG5-HKSCS? Simon Josefsson
2003-11-14 0:47 ` eight-bit char handling in emacs-unicode Kenichi Handa
2003-11-14 13:25 ` Oliver Scholz
2003-11-15 1:09 ` Kenichi Handa
2003-11-15 10:26 ` Oliver Scholz
2003-11-15 21:47 ` Simon Josefsson
2003-11-15 3:04 ` Simon Josefsson
2003-11-16 15:03 ` Alex Schroeder
2003-11-17 21:17 ` Stefan Monnier
2003-11-18 7:33 ` Kenichi Handa
2003-11-18 17:12 ` Stefan Monnier
2003-11-19 0:06 ` Kenichi Handa
2003-11-19 3:05 ` Stefan Monnier
2003-11-19 10:46 ` Juri Linkov
2003-11-19 13:48 ` Stefan Monnier
2003-11-20 23:41 ` Kenichi Handa
2003-11-21 0:41 ` Kenichi Handa
2003-11-21 5:27 ` Stefan Monnier
2003-11-21 6:27 ` Kenichi Handa
2003-11-21 14:59 ` Stefan Monnier
2003-11-22 1:25 ` Kenichi Handa
2003-11-22 23:53 ` Stefan Monnier
2003-11-23 7:30 ` Kenichi Handa
2003-11-23 23:48 ` Stefan Monnier
2003-11-25 1:07 ` Kenichi Handa
[not found] ` <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
2003-11-26 0:07 ` Kenichi Handa [this message]
2003-11-26 14:14 ` Stefan Monnier
2003-11-27 1:34 ` Kenichi Handa
2003-11-27 14:23 ` Stefan Monnier
2003-12-01 0:43 ` Kenichi Handa
2003-12-01 16:15 ` Stefan Monnier
2003-12-02 13:07 ` Kenichi Handa
2003-12-02 16:06 ` Stefan Monnier
2003-11-25 4:28 ` Richard Stallman
[not found] ` <jwv7k1gtswz.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
2003-12-09 21:49 ` Richard Stallman
2003-11-15 22:32 ` BIG5-HKSCS? Simon Josefsson
2003-11-17 1:12 ` BIG5-HKSCS? Kenichi Handa
2003-11-17 2:06 ` BIG5-HKSCS? Simon Josefsson
2003-11-17 5:45 ` BIG5-HKSCS? Eli Zaretskii
2003-11-17 7:43 ` BIG5-HKSCS? Simon Josefsson
2003-11-18 7:01 ` BIG5-HKSCS? Richard Stallman
2003-11-18 8:56 ` BIG5-HKSCS? Simon Josefsson
2003-11-19 5:15 ` BIG5-HKSCS? Richard Stallman
2003-11-20 5:48 ` BIG5-HKSCS? Simon Josefsson
2003-11-20 5:56 ` BIG5-HKSCS? Eli Zaretskii
2003-11-20 6:20 ` BIG5-HKSCS? Simon Josefsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200311260007.JAA26617@etlken.m17n.org \
--to=handa@m17n.org \
--cc=emacs-devel@gnu.org \
--cc=jas@extundo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).