all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
Cc: miles@gnu.org
Subject: Re: setenv -> locale-coding-system cannot handle ASCII?!
Date: Wed, 26 Feb 2003 14:32:16 +0900 (JST)	[thread overview]
Message-ID: <200302260532.OAA29294@etlken.m17n.org> (raw)
In-Reply-To: <200302260252.h1Q2qIK08490@rum.cs.yale.edu> (monnier+gnu/emacs@rum.cs.yale.edu)

In article <200302260252.h1Q2qIK08490@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
> I consider this context-dependent meaning of unibyte strings
> to be a problem.  I understand why text in a unibyte buffer
> has such an ambiguous meaning and agree that it's difficult
> to avoid, but it's not a reason to carry over this difficulty
> to strings where it is not needed.

Why is it not needed?  Strings and buffers are not that
different, both are containers of characters.  If we get a
unibyte string from a unibyte buffer by buffer-substring,
how should we treat that string?

>>  In the former case, as it is given to encode-coding-string,
>>  it is a multibyte form by which emacs represents
>>  character(s), not a sequence of characters representing raw
>>  bytes.

> The problem is that the multibyteness of strings is not
> always as easy to guess/control.

I agree.

> For example: what is the multibyteness of

> 	(concat "\201" (format "%s" "hello"))
> and
> 	(concat "\201" (format "%s" 1))

The latter yields multibyte, but I think it'a bug.  I found
that "(format "%s" 1)" is implemented by using
prin1-to-string, and prin1-to-string prints an object to a
temporary buffer and gets that buffer string.  So, in a
multibyte sesstion "(format "%s" 1)" yields a multibyte
string.  :-(

>>  In the latter case, as it is given to string-to-multibyte,
>>  it should be regard as a sequence of characters representing
>>  raw bytes, thus the result of (string-to-multibyte
>>  "\201\300") is still a sequence of raw-bytes.  Encoding
>>  raw-bytes should yield the same raw-bytes.

> Indeed, that's what I and `setenv' would want.

>>  And, this behaviour of encode-coding-string on a unibyte
>>  string is a natural consequence of encode-coding-region in a
>>  unibyte buffer.

> As mentioned above, I understand why it works that way in buffers,
> but I don't think it has to work the same way for strings.

So, do you mean that you want this?

    If a unibyte buffer has \201\300 in the region FROM and TO,

    (encode-coding-string (buffer-substring FROM TO) 'iso-latin-1)
	=> "\201\300"

    (encode-coding-region FROM TO 'iso-latin-1) changes the
    region to \300.

Isn't it more confusing?

By the way, I also really really hate this unibyte/mulitbyte
problem.  Sometimes I think I should have opposed to the
introduction of such a concept more strongly.

    imagine there's no unibyte 
    it's easy if you try
    no bytes below us
    above us only chars
    imagine all the people living in multibyte

:-)

---
Ken'ichi HANDA
handa@m17n.org

  reply	other threads:[~2003-02-26  5:32 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-02-25  0:18 setenv -> locale-coding-system cannot handle ASCII?! Sam Steingold
2003-02-25  6:34 ` Kenichi Handa
2003-02-25  6:47   ` Miles Bader
2003-02-26  0:58     ` Kenichi Handa
2003-02-26  2:11       ` Stefan Monnier
2003-02-26  2:34         ` Kenichi Handa
2003-02-26  2:52           ` Stefan Monnier
2003-02-26  5:32             ` Kenichi Handa [this message]
2003-02-26  5:50               ` Stefan Monnier
2003-02-26  7:49                 ` Kenichi Handa
2003-02-26  8:05                   ` Kenichi Handa
2003-02-26  8:08                     ` Stefan Monnier
2003-02-26  8:12                   ` Stefan Monnier
2003-02-26  8:38                     ` tar-mode Kenichi Handa
2003-02-26  8:53                       ` tar-mode Stefan Monnier
2003-02-26 11:53                         ` tar-mode Kenichi Handa
2003-02-26 12:22                           ` tar-mode Stefan Monnier
2003-02-26 23:26                   ` setenv -> locale-coding-system cannot handle ASCII?! Richard Stallman
2003-02-26 23:26                   ` Richard Stallman
2003-02-26 23:26                 ` Richard Stallman
2003-02-26 23:26               ` Richard Stallman
2003-02-27  0:06                 ` Miles Bader
2003-03-03 18:59                   ` Richard Stallman
2003-03-04  2:48                     ` Miles Bader
2003-03-04  4:33                       ` Kenichi Handa
2003-03-05 20:46                       ` Richard Stallman
2003-02-26 23:25       ` Richard Stallman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200302260532.OAA29294@etlken.m17n.org \
    --to=handa@m17n.org \
    --cc=miles@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.