Re: eight-bit char handling in emacs-unicode

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

From: Kenichi Handa <handa@m17n.org>
Cc: jas@extundo.com, emacs-devel@gnu.org
Subject: Re: eight-bit char handling in emacs-unicode
Date: Wed, 26 Nov 2003 09:07:47 +0900 (JST)	[thread overview]
Message-ID: <200311260007.JAA26617@etlken.m17n.org> (raw)
In-Reply-To: <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca> (message from Stefan Monnier on 25 Nov 2003 10:43:05 -0500)

In article <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>>  It seems that you keep of saying that "A does B, thus it's
>>  nonsense".  But, I'm arguing that "A does C".

> Well, the thing is: I still don't understand what is C.
> From what I understand, you say that C is "a conversion from multibyte
> to a sequence of code-points",

Yes, that what I said.

> but since the output is a unibyte string,
> that restrict it to cases where the code-points can be encoded in 8 bits,
> thus it doesn't sound very generic

Yes.  But I thought generic or not is not a point here.

> and I don't see any application for it
> (nor do I see any practical difference with using encode-coding-string
> since the output AFAIK would be the same).

My examples shows that we can't use encode-coding-string.
How can we use encode-coding-string without knowing what
coding system to use?  I haven't heard your answer yet.

>>  It doesn't make sense because you treat the result as "a
>>  unibyte string encoded in Latin-1".

>>  It makes sense if you treat the result as "a unibyte string
>>  in which each byte represents a sequence of Unicode
>>  code-points", doesn't it?

> But each byte can only represent the 0-255 subset of unicode code-points, in
> which case this is equivalent (practically speaking) to latin-1, isn't it ?

Yes.  And that covers all characters the user uses in this
case.

>>>  It'd make sense if the environment said "latin-1 when you can,
>>>  utf-8 otherwise" or something like that, but then we would use
>>>  encode-coding-string anyway.

>>  It's itself nonsense to have such a coding system.

> I was not thinking of a coding-system, but just some encoding job,
> such as what is done when saving a buffer (where my .emacs does exactly
> that: try latin-1 first and utf-8 if that fails).

Ah, I see.  But, my understanding is that
string-make-unibyte/multibyte are designed not to change the
number of characters to make the difference of
unibyte/multibyte transparent in Lisp.  That restriction
leads to a case that non-supported characters are handled
incorrectly.  But, I think Richard's design policy was that
incorrect handling of non-supported characters is better
than a possibly more disastrous error caused by the change
of number of characters.

>>  Do you agree with having string-make-unibyte if it signals an error on
>>  non-Latin-1 characters?

> Of course: that's pretty much what I suggested: make-string-unibyte only
> accepts multibyte chars that correspond to "bytes".

I agree with that.  But, it just changes the behaviour of
the function on error case.  It doesn't change the concept
of what it does.

>>>  I just don't know of a concrete case where it makes sense to use
>>>  string-make-unibyte.

>>  I'll paraphrase my previous example as this:

>>    It is perfectly possible to live in such an environment
>>    where only the characters U+0000..U+00FF of Unicode is
>>    used but only the coding system utf-8 is used.

>>  But, I don't claim that the above is a realistic case.

>>  Another non-realistic but concrete case is:

>>    Use only the charset iso-8859-5 and the encoding CTEXT.

> I don't see any use of string-make-unibyte in your two examples.

Again, I'd like to ask how to use encode-coding-string
without knowing the proper coding-system in each case.

> And "having string-make-unibyte if it signals an error on non-Latin-1
> characters" means that the second example can't be used any more.

In the second case, of course "supported characters" are
what included in the charset iso-8859-5, and
string-make-unibyte should accept them.  Again, the result
is the same as encoding by the coding system iso-8859-5, but
we only know about the coding system CTEXT here.

---
Ken'ichi HANDA
handa@m17n.org

next prev parent reply	other threads:[~2003-11-26  0:07 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-12 16:11 BIG5-HKSCS? Simon Josefsson
2003-11-13  1:53 ` BIG5-HKSCS? Kenichi Handa
2003-11-13  4:14   ` BIG5-HKSCS? Simon Josefsson
2003-11-13  5:34     ` BIG5-HKSCS? Kenichi Handa
2003-11-13  5:50       ` BIG5-HKSCS? Simon Josefsson
2003-11-13  4:49   ` BIG5-HKSCS? Simon Josefsson
2003-11-13  6:10     ` BIG5-HKSCS? Kenichi Handa
2003-11-13  6:51       ` BIG5-HKSCS? Simon Josefsson
2003-11-13  9:01         ` BIG5-HKSCS? Kenichi Handa
2003-11-13 13:29           ` BIG5-HKSCS? Oliver Scholz
2003-11-13 23:40             ` BIG5-HKSCS? Kenichi Handa
2003-11-14 13:35               ` BIG5-HKSCS? Oliver Scholz
2003-11-13 16:34           ` BIG5-HKSCS? Simon Josefsson
2003-11-14  0:47             ` eight-bit char handling in emacs-unicode Kenichi Handa
2003-11-14 13:25               ` Oliver Scholz
2003-11-15  1:09                 ` Kenichi Handa
2003-11-15 10:26                   ` Oliver Scholz
2003-11-15 21:47                     ` Simon Josefsson
2003-11-15  3:04               ` Simon Josefsson
2003-11-16 15:03                 ` Alex Schroeder
2003-11-17 21:17               ` Stefan Monnier
2003-11-18  7:33                 ` Kenichi Handa
2003-11-18 17:12                   ` Stefan Monnier
2003-11-19  0:06                     ` Kenichi Handa
2003-11-19  3:05                       ` Stefan Monnier
2003-11-19 10:46                         ` Juri Linkov
2003-11-19 13:48                           ` Stefan Monnier
2003-11-20 23:41                           ` Kenichi Handa
2003-11-21  0:41                         ` Kenichi Handa
2003-11-21  5:27                           ` Stefan Monnier
2003-11-21  6:27                             ` Kenichi Handa
2003-11-21 14:59                               ` Stefan Monnier
2003-11-22  1:25                                 ` Kenichi Handa
2003-11-22 23:53                                   ` Stefan Monnier
2003-11-23  7:30                                     ` Kenichi Handa
2003-11-23 23:48                                       ` Stefan Monnier
2003-11-25  1:07                                         ` Kenichi Handa
     [not found]                                           ` <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
2003-11-26  0:07                                             ` Kenichi Handa [this message]
2003-11-26 14:14                                               ` Stefan Monnier
2003-11-27  1:34                                                 ` Kenichi Handa
2003-11-27 14:23                                                   ` Stefan Monnier
2003-12-01  0:43                                                     ` Kenichi Handa
2003-12-01 16:15                                                       ` Stefan Monnier
2003-12-02 13:07                                                         ` Kenichi Handa
2003-12-02 16:06                                                           ` Stefan Monnier
2003-11-25  4:28                                         ` Richard Stallman
     [not found]                                     ` <jwv7k1gtswz.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
2003-12-09 21:49                                       ` Richard Stallman
2003-11-15 22:32       ` BIG5-HKSCS? Simon Josefsson
2003-11-17  1:12         ` BIG5-HKSCS? Kenichi Handa
2003-11-17  2:06           ` BIG5-HKSCS? Simon Josefsson
2003-11-17  5:45             ` BIG5-HKSCS? Eli Zaretskii
2003-11-17  7:43               ` BIG5-HKSCS? Simon Josefsson
2003-11-18  7:01                 ` BIG5-HKSCS? Richard Stallman
2003-11-18  8:56                   ` BIG5-HKSCS? Simon Josefsson
2003-11-19  5:15                     ` BIG5-HKSCS? Richard Stallman
2003-11-20  5:48                       ` BIG5-HKSCS? Simon Josefsson
2003-11-20  5:56                         ` BIG5-HKSCS? Eli Zaretskii
2003-11-20  6:20                           ` BIG5-HKSCS? Simon Josefsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200311260007.JAA26617@etlken.m17n.org \
    --to=handa@m17n.org \
    --cc=emacs-devel@gnu.org \
    --cc=jas@extundo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).