From: "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu>
Cc: monnier+gnu/emacs@rum.cs.yale.edu
Subject: Re: setenv -> locale-coding-system cannot handle ASCII?!
Date: Wed, 26 Feb 2003 00:50:27 -0500 [thread overview]
Message-ID: <200302260550.h1Q5oSc08967@rum.cs.yale.edu> (raw)
In-Reply-To: 200302260532.OAA29294@etlken.m17n.org
> > I consider this context-dependent meaning of unibyte strings
> > to be a problem. I understand why text in a unibyte buffer
> > has such an ambiguous meaning and agree that it's difficult
> > to avoid, but it's not a reason to carry over this difficulty
> > to strings where it is not needed.
>
> Why is it not needed? Strings and buffers are not that
> different, both are containers of characters.
They are used differently. Operations on strings generally apply to the
whole string: you can only encode/decode a whole string at a time.
> If we get a unibyte string from a unibyte buffer by buffer-substring,
> how should we treat that string?
Like any other unibyte string: as a sequence of raw bytes.
If you want to treat it as a sequence of characters, then
you need to pass it through `string-as-multibyte'.
In buffers, there is sometimes a need to represent multibyte chars
inside a unibyte buffer because only part of the buffer is
decoded. For a string, that can be avoided. You can make sure
that if it is decoded it's a multibyte string and if it's not
then it's a unibyte string.
> > For example: what is the multibyteness of
>
> > (concat "\201" (format "%s" "hello"))
> > and
> > (concat "\201" (format "%s" 1))
>
> The latter yields multibyte, but I think it'a bug. I found
> that "(format "%s" 1)" is implemented by using
> prin1-to-string, and prin1-to-string prints an object to a
> temporary buffer and gets that buffer string. So, in a
> multibyte sesstion "(format "%s" 1)" yields a multibyte
> string. :-(
I know: I bumped into it yesterday while playing around with tar-mode.
How about the attached patch ?
> So, do you mean that you want this?
>
> If a unibyte buffer has \201\300 in the region FROM and TO,
>
> (encode-coding-string (buffer-substring FROM TO) 'iso-latin-1)
> => "\201\300"
>
> (encode-coding-region FROM TO 'iso-latin-1) changes the
> region to \300.
Yes, I guess I'd be happy with it.
> Isn't it more confusing?
Not to me.
> By the way, I also really really hate this unibyte/mulitbyte
> problem. Sometimes I think I should have opposed to the
> introduction of such a concept more strongly.
But it's pretty damn handy for binary data.
Stefan
PS: I wish there was a way to swap two buffers's content so that
tar-mode could swap the (potentially very large) data to
a helper buffer (without needing to copy this large data)
and then use multibyte for the display and unibyte for
the helper buffer.
Index: print.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/print.c,v
retrieving revision 1.184
diff -u -r1.184 print.c
--- print.c 4 Feb 2003 14:03:13 -0000 1.184
+++ print.c 26 Feb 2003 05:43:26 -0000
@@ -774,9 +774,12 @@
/* Make Vprin1_to_string_buffer be the default buffer after PRINTFINSH */
PRINTFINISH;
set_buffer_internal (XBUFFER (Vprin1_to_string_buffer));
+ if (ZV == ZV_BYTE)
+ Fset_buffer_multibyte (Qnil);
object = Fbuffer_string ();
Ferase_buffer ();
+ Fset_buffer_multibyte (Qt);
set_buffer_internal (old);
Vdeactivate_mark = tem;
next prev parent reply other threads:[~2003-02-26 5:50 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-02-25 0:18 setenv -> locale-coding-system cannot handle ASCII?! Sam Steingold
2003-02-25 6:34 ` Kenichi Handa
2003-02-25 6:47 ` Miles Bader
2003-02-26 0:58 ` Kenichi Handa
2003-02-26 2:11 ` Stefan Monnier
2003-02-26 2:34 ` Kenichi Handa
2003-02-26 2:52 ` Stefan Monnier
2003-02-26 5:32 ` Kenichi Handa
2003-02-26 5:50 ` Stefan Monnier [this message]
2003-02-26 7:49 ` Kenichi Handa
2003-02-26 8:05 ` Kenichi Handa
2003-02-26 8:08 ` Stefan Monnier
2003-02-26 8:12 ` Stefan Monnier
2003-02-26 8:38 ` tar-mode Kenichi Handa
2003-02-26 8:53 ` tar-mode Stefan Monnier
2003-02-26 11:53 ` tar-mode Kenichi Handa
2003-02-26 12:22 ` tar-mode Stefan Monnier
2003-02-26 23:26 ` setenv -> locale-coding-system cannot handle ASCII?! Richard Stallman
2003-02-26 23:26 ` Richard Stallman
2003-02-26 23:26 ` Richard Stallman
2003-02-26 23:26 ` Richard Stallman
2003-02-27 0:06 ` Miles Bader
2003-03-03 18:59 ` Richard Stallman
2003-03-04 2:48 ` Miles Bader
2003-03-04 4:33 ` Kenichi Handa
2003-03-05 20:46 ` Richard Stallman
2003-02-26 23:25 ` Richard Stallman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200302260550.h1Q5oSc08967@rum.cs.yale.edu \
--to=monnier+gnu/emacs@rum.cs.yale.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.