unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* setenv -> locale-coding-system cannot handle ASCII?!
@ 2003-02-25  0:18 Sam Steingold
  2003-02-25  6:34 ` Kenichi Handa
  0 siblings, 1 reply; 27+ messages in thread
From: Sam Steingold @ 2003-02-25  0:18 UTC (permalink / raw)


GNU Emacs 21.3.50.7 (i686-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2003-02-24 on loiso.podval.org

Debugger entered--Lisp error: (error "Can't encode
  `SSH_AUTH_SOCK=/tmp/ssh-XXgfCthd/agent.1191' with
  `locale-coding-system'") 
  signal(error ("Can't encode `SSH_AUTH_SOCK=/tmp/ssh-XXgfCthd/agent.1191' 
                 with `locale-coding-system'")) 
  error("Can't encode `%s=%s' with `locale-coding-system'"
        "SSH_AUTH_SOCK" "/tmp/ssh-XXgfCthd/agent.1191") 
  setenv("SSH_AUTH_SOCK" "/tmp/ssh-XXgfCthd/agent.1191")


-- 
Sam Steingold (http://www.podval.org/~sds) running RedHat8 GNU/Linux
<http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/>
<http://www.mideasttruth.com/> <http://www.palestine-central.com/links.html>
God had a deadline, so He wrote it all in Lisp.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-25  0:18 setenv -> locale-coding-system cannot handle ASCII?! Sam Steingold
@ 2003-02-25  6:34 ` Kenichi Handa
  2003-02-25  6:47   ` Miles Bader
  0 siblings, 1 reply; 27+ messages in thread
From: Kenichi Handa @ 2003-02-25  6:34 UTC (permalink / raw)
  Cc: d.love

In article <m3lm05ciwr.fsf@loiso.podval.org>, Sam Steingold <sds@gnu.org> writes:
> GNU Emacs 21.3.50.7 (i686-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
>  of 2003-02-24 on loiso.podval.org

> Debugger entered--Lisp error: (error "Can't encode
>   `SSH_AUTH_SOCK=/tmp/ssh-XXgfCthd/agent.1191' with
>   `locale-coding-system'") 
>   signal(error ("Can't encode `SSH_AUTH_SOCK=/tmp/ssh-XXgfCthd/agent.1191' 
>                  with `locale-coding-system'")) 

Miles Bader <miles@lsi.nec.co.jp> writes:
> I try to do this using today's CVS:

>   M-x setenv RET lk25 RET /proj/soft2/uclinux/uclinux/linux-2.5.63-uc0 RET

> and got this error:

>   setenv: Can't encode `lk25=/proj/soft2/uclinux/uclinux/linux-2.5.63-uc0' with `locale-coding-system'

I've just installed this change.  I think it should fix the
above problems.  Please try the new code.

2003-02-25  Kenichi Handa  <handa@m17n.org>

	* env.el (setenv): Fix previous change.

*** env.el.~1.28.~	Tue Feb 25 09:43:17 2003
--- env.el	Tue Feb 25 15:09:45 2003
***************
*** 121,133 ****
  	     nil
  	     t))))
    (if (and (multibyte-string-p variable) locale-coding-system)
!       (unless (memq (coding-system-base locale-coding-system)
! 		    (find-coding-systems-string (concat variable value)))
! 	(error "Can't encode `%s=%s' with `locale-coding-system'"
! 	       variable (or value "")))
!     (unless (memq 'undecided (find-coding-systems-string variable))
!       (error "Can't encode `%s=%s' with unspecified `locale-coding-system'"
! 	     variable (or value ""))))
    (if unset 
        (setq value nil)
      (if substitute-env-vars
--- 121,131 ----
  	     nil
  	     t))))
    (if (and (multibyte-string-p variable) locale-coding-system)
!       (let ((codings (find-coding-systems-string (concat variable value))))
! 	(unless (or (eq 'undecided (car codings))
! 		    (memq (coding-system-base locale-coding-system) codings))
! 	  (error "Can't encode `%s=%s' with `locale-coding-system'"
! 		 variable (or value "")))))
    (if unset 
        (setq value nil)
      (if substitute-env-vars

> What's wierd is that I _don't_ get an error if I invoke the same command
> via C-x ESC ESC (repeat-complex-command).

> Looking at the code for `setenv,' I'm not sure what's going on; in this
> snippet (which is the only place the above error occurs):

>   (if (and (multibyte-string-p variable) locale-coding-system)
>       (unless (memq (coding-system-base locale-coding-system)
> 		    (find-coding-systems-string (concat variable value)))
> 	(error "Can't encode `%s=%s' with `locale-coding-system'"
> 	       variable (or value "")))
>      ...

> the call to multibyte-string-p seems to be odd -- if I just evaluate
> (multibyte-string-p "lk25") it returns nil, but if I get an error
> backtrace so that `variable' is bound to "lk25", and evaluate
> (multibyte-string-p variable), then it returns t!

> Since (find-coding-systems-string (concat variable value)) always seems
> to return just '(undecided), something seems dreadfully wrong.

> [I'm confused about what `multibyte-string-p' actually _means_, by the
> way -- shouldn't it only ever return t if the string contains non-ascii
> characters?]

Looong ago I proposed the same thing to Richard.  His answer
was that the multibyteness of a string should follow the
source of the string.  If it is a substring of a multibyte
buffer/string, it should be multibyte.  Only when we can't
determine the multibyteness of a source (e.g. lisp reader),
we should determine the multibyteness of a string from it's
contents.

And, as read-from-minibuffer, etc usually inherit the
multibyteness of the current buffer, what you type after M-x
setenv are also multibyte even if it contains only ASCII
characters.

But, as C-x ESC ESC reads s-exp by lisp reader, the ASCII
strings in the s-exp will be unibyte.  Thus, you didn't get
an error.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-25  6:34 ` Kenichi Handa
@ 2003-02-25  6:47   ` Miles Bader
  2003-02-26  0:58     ` Kenichi Handa
  0 siblings, 1 reply; 27+ messages in thread
From: Miles Bader @ 2003-02-25  6:47 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:
> > [I'm confused about what `multibyte-string-p' actually _means_, by the
> > way -- shouldn't it only ever return t if the string contains non-ascii
> > characters?]
> 
> Looong ago I proposed the same thing to Richard.  His answer
> was that the multibyteness of a string should follow the
> source of the string.

That would seem to make it almost useless for a lisp programmer...
What is the purpose of using it in a function like `setenv'?

Is it just an efficiency hack?  If so, why not just move the test into
find-coding-systems-string [or whatever function actually does the work
for it] so that ordinary programms don't have to worry about such
sillyness?

-Miles
-- 
`Cars give people wonderful freedom and increase their opportunities.
 But they also destroy the environment, to an extent so drastic that
 they kill all social life' (from _A Pattern Language_)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-25  6:47   ` Miles Bader
@ 2003-02-26  0:58     ` Kenichi Handa
  2003-02-26  2:11       ` Stefan Monnier
  2003-02-26 23:25       ` Richard Stallman
  0 siblings, 2 replies; 27+ messages in thread
From: Kenichi Handa @ 2003-02-26  0:58 UTC (permalink / raw)
  Cc: emacs-devel

In article <buo65r8ua9i.fsf@mcspd15.ucom.lsi.nec.co.jp>, Miles Bader <miles@lsi.nec.co.jp> writes:
> Kenichi Handa <handa@m17n.org> writes:
>>  > [I'm confused about what `multibyte-string-p' actually _means_, by the
>>  > way -- shouldn't it only ever return t if the string contains non-ascii
>>  > characters?]
>>  
>>  Looong ago I proposed the same thing to Richard.  His answer
>>  was that the multibyteness of a string should follow the
>>  source of the string.

> That would seem to make it almost useless for a lisp programmer...
> What is the purpose of using it in a function like `setenv'?
> Is it just an efficiency hack?  If so, why not just move the test into
> find-coding-systems-string [or whatever function actually does the work
> for it] so that ordinary programms don't have to worry about such
> sillyness?

For this part:

  (if (and (multibyte-string-p variable) locale-coding-system)
      (let ((codings (find-coding-systems-string (concat variable value))))
	(unless (or (eq 'undecided (car codings))
		    (memq (coding-system-base locale-coding-system) codings))
	  (error "Can't encode `%s=%s' with `locale-coding-system'"
		 variable (or value "")))))

yes, multibyte-string-p is just for efficiency.  We can make
it simply to:

  (let ((codings (find-coding-systems-string (concat variable value))))
    (unless (or (eq 'undecided (car codings))
		(memq (coding-system-base locale-coding-system) codings))
      (error "Can't encode `%s=%s' with `locale-coding-system'"
	     variable (or value ""))))

But, in this part,

  (if (multibyte-string-p variable)
      (setq variable (encode-coding-string variable locale-coding-system)))

multibyte-string-p is mandatory because encode-coding-string
will change the byte-sequence of `variable' even if it is
unibyte.
Ex. (encode-coding-string "\201\300" 'iso-latin-1) => "\300"

By the way, I still think it's better to have multibyte
strings in process-environment in a multibyte session.  But,
it seems that Richard is not convinced.  So, we need
something like this Dave's change.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-26  0:58     ` Kenichi Handa
@ 2003-02-26  2:11       ` Stefan Monnier
  2003-02-26  2:34         ` Kenichi Handa
  2003-02-26 23:25       ` Richard Stallman
  1 sibling, 1 reply; 27+ messages in thread
From: Stefan Monnier @ 2003-02-26  2:11 UTC (permalink / raw)
  Cc: miles

>   (if (multibyte-string-p variable)
>       (setq variable (encode-coding-string variable locale-coding-system)))
> 
> multibyte-string-p is mandatory because encode-coding-string
> will change the byte-sequence of `variable' even if it is
> unibyte.
> Ex. (encode-coding-string "\201\300" 'iso-latin-1) => "\300"

I find this behavior annoying because it makes the emacs-mule
encoding appear in a situation where it is not mentioned.
I wish that

    (encode-coding-string "\201\300" 'iso-latin-1)
and
    (encode-coding-string (string-to-multibyte "\201\300") 'iso-latin-1)

returned the same value.


	Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-26  2:11       ` Stefan Monnier
@ 2003-02-26  2:34         ` Kenichi Handa
  2003-02-26  2:52           ` Stefan Monnier
  0 siblings, 1 reply; 27+ messages in thread
From: Kenichi Handa @ 2003-02-26  2:34 UTC (permalink / raw)
  Cc: miles

In article <200302260211.h1Q2BJl08373@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
>>    (if (multibyte-string-p variable)
>>        (setq variable (encode-coding-string variable locale-coding-system)))
>>  
>>  multibyte-string-p is mandatory because encode-coding-string
>>  will change the byte-sequence of `variable' even if it is
>>  unibyte.
>>  Ex. (encode-coding-string "\201\300" 'iso-latin-1) => "\300"

> I find this behavior annoying because it makes the emacs-mule
> encoding appear in a situation where it is not mentioned.
> I wish that

>     (encode-coding-string "\201\300" 'iso-latin-1)
> and
>     (encode-coding-string (string-to-multibyte "\201\300") 'iso-latin-1)

> returned the same value.

Why?  As I wrote before, what does bytes of unibyte string
means depends on a context.

In the former case, as it is given to encode-coding-string,
it is a multibyte form by which emacs represents
character(s), not a sequence of characters representing raw
bytes.

In the latter case, as it is given to string-to-multibyte,
it should be regard as a sequence of characters representing
raw bytes, thus the result of (string-to-multibyte
"\201\300") is still a sequence of raw-bytes.  Encoding
raw-bytes should yield the same raw-bytes.

And, this behaviour of encode-coding-string on a unibyte
string is a natural consequence of encode-coding-region in a
unibyte buffer.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-26  2:34         ` Kenichi Handa
@ 2003-02-26  2:52           ` Stefan Monnier
  2003-02-26  5:32             ` Kenichi Handa
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Monnier @ 2003-02-26  2:52 UTC (permalink / raw)
  Cc: monnier+gnu/emacs

> >>    (if (multibyte-string-p variable)
> >>        (setq variable (encode-coding-string variable locale-coding-system)))
> >>  
> >>  multibyte-string-p is mandatory because encode-coding-string
> >>  will change the byte-sequence of `variable' even if it is
> >>  unibyte.
> >>  Ex. (encode-coding-string "\201\300" 'iso-latin-1) => "\300"
> 
> > I find this behavior annoying because it makes the emacs-mule
> > encoding appear in a situation where it is not mentioned.
> > I wish that
> 
> >     (encode-coding-string "\201\300" 'iso-latin-1)
> > and
> >     (encode-coding-string (string-to-multibyte "\201\300") 'iso-latin-1)
> 
> > returned the same value.
> 
> Why?  As I wrote before, what does bytes of unibyte string
> means depends on a context.

I consider this context-dependent meaning of unibyte strings
to be a problem.  I understand why text in a unibyte buffer
has such an ambiguous meaning and agree that it's difficult
to avoid, but it's not a reason to carry over this difficulty
to strings where it is not needed.

> In the former case, as it is given to encode-coding-string,
> it is a multibyte form by which emacs represents
> character(s), not a sequence of characters representing raw
> bytes.

The problem is that the multibyteness of strings is not
always as easy to guess/control.  For example: what is the
multibyteness of

	(concat "\201" (format "%s" "hello"))
and
	(concat "\201" (format "%s" 1))

> In the latter case, as it is given to string-to-multibyte,
> it should be regard as a sequence of characters representing
> raw bytes, thus the result of (string-to-multibyte
> "\201\300") is still a sequence of raw-bytes.  Encoding
> raw-bytes should yield the same raw-bytes.

Indeed, that's what I and `setenv' would want.

> And, this behaviour of encode-coding-string on a unibyte
> string is a natural consequence of encode-coding-region in a
> unibyte buffer.

As mentioned above, I understand why it works that way in buffers,
but I don't think it has to work the same way for strings.


	Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-26  2:52           ` Stefan Monnier
@ 2003-02-26  5:32             ` Kenichi Handa
  2003-02-26  5:50               ` Stefan Monnier
  2003-02-26 23:26               ` Richard Stallman
  0 siblings, 2 replies; 27+ messages in thread
From: Kenichi Handa @ 2003-02-26  5:32 UTC (permalink / raw)
  Cc: miles

In article <200302260252.h1Q2qIK08490@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
> I consider this context-dependent meaning of unibyte strings
> to be a problem.  I understand why text in a unibyte buffer
> has such an ambiguous meaning and agree that it's difficult
> to avoid, but it's not a reason to carry over this difficulty
> to strings where it is not needed.

Why is it not needed?  Strings and buffers are not that
different, both are containers of characters.  If we get a
unibyte string from a unibyte buffer by buffer-substring,
how should we treat that string?

>>  In the former case, as it is given to encode-coding-string,
>>  it is a multibyte form by which emacs represents
>>  character(s), not a sequence of characters representing raw
>>  bytes.

> The problem is that the multibyteness of strings is not
> always as easy to guess/control.

I agree.

> For example: what is the multibyteness of

> 	(concat "\201" (format "%s" "hello"))
> and
> 	(concat "\201" (format "%s" 1))

The latter yields multibyte, but I think it'a bug.  I found
that "(format "%s" 1)" is implemented by using
prin1-to-string, and prin1-to-string prints an object to a
temporary buffer and gets that buffer string.  So, in a
multibyte sesstion "(format "%s" 1)" yields a multibyte
string.  :-(

>>  In the latter case, as it is given to string-to-multibyte,
>>  it should be regard as a sequence of characters representing
>>  raw bytes, thus the result of (string-to-multibyte
>>  "\201\300") is still a sequence of raw-bytes.  Encoding
>>  raw-bytes should yield the same raw-bytes.

> Indeed, that's what I and `setenv' would want.

>>  And, this behaviour of encode-coding-string on a unibyte
>>  string is a natural consequence of encode-coding-region in a
>>  unibyte buffer.

> As mentioned above, I understand why it works that way in buffers,
> but I don't think it has to work the same way for strings.

So, do you mean that you want this?

    If a unibyte buffer has \201\300 in the region FROM and TO,

    (encode-coding-string (buffer-substring FROM TO) 'iso-latin-1)
	=> "\201\300"

    (encode-coding-region FROM TO 'iso-latin-1) changes the
    region to \300.

Isn't it more confusing?

By the way, I also really really hate this unibyte/mulitbyte
problem.  Sometimes I think I should have opposed to the
introduction of such a concept more strongly.

    imagine there's no unibyte 
    it's easy if you try
    no bytes below us
    above us only chars
    imagine all the people living in multibyte

:-)

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-26  5:32             ` Kenichi Handa
@ 2003-02-26  5:50               ` Stefan Monnier
  2003-02-26  7:49                 ` Kenichi Handa
  2003-02-26 23:26                 ` Richard Stallman
  2003-02-26 23:26               ` Richard Stallman
  1 sibling, 2 replies; 27+ messages in thread
From: Stefan Monnier @ 2003-02-26  5:50 UTC (permalink / raw)
  Cc: monnier+gnu/emacs

> > I consider this context-dependent meaning of unibyte strings
> > to be a problem.  I understand why text in a unibyte buffer
> > has such an ambiguous meaning and agree that it's difficult
> > to avoid, but it's not a reason to carry over this difficulty
> > to strings where it is not needed.
> 
> Why is it not needed?  Strings and buffers are not that
> different, both are containers of characters.

They are used differently.  Operations on strings generally apply to the
whole string: you can only encode/decode a whole string at a time.

> If we get a unibyte string from a unibyte buffer by buffer-substring,
> how should we treat that string?

Like any other unibyte string: as a sequence of raw bytes.
If you want to treat it as a sequence of characters, then
you need to pass it through `string-as-multibyte'.

In buffers, there is sometimes a need to represent multibyte chars
inside a unibyte buffer because only part of the buffer is
decoded.  For a string, that can be avoided.  You can make sure
that if it is decoded it's a multibyte string and if it's not
then it's a unibyte string.

> > For example: what is the multibyteness of
> 
> > 	(concat "\201" (format "%s" "hello"))
> > and
> > 	(concat "\201" (format "%s" 1))
> 
> The latter yields multibyte, but I think it'a bug.  I found
> that "(format "%s" 1)" is implemented by using
> prin1-to-string, and prin1-to-string prints an object to a
> temporary buffer and gets that buffer string.  So, in a
> multibyte sesstion "(format "%s" 1)" yields a multibyte
> string.  :-(

I know: I bumped into it yesterday while playing around with tar-mode.
How about the attached patch ?

> So, do you mean that you want this?
> 
>     If a unibyte buffer has \201\300 in the region FROM and TO,
> 
>     (encode-coding-string (buffer-substring FROM TO) 'iso-latin-1)
> 	=> "\201\300"
> 
>     (encode-coding-region FROM TO 'iso-latin-1) changes the
>     region to \300.

Yes, I guess I'd be happy with it.

> Isn't it more confusing?

Not to me.

> By the way, I also really really hate this unibyte/mulitbyte
> problem.  Sometimes I think I should have opposed to the
> introduction of such a concept more strongly.

But it's pretty damn handy for binary data.


	Stefan


PS: I wish there was a way to swap two buffers's content so that
    tar-mode could swap the (potentially very large) data to
    a helper buffer (without needing to copy this large data)
    and then use multibyte for the display and unibyte for
    the helper buffer.


Index: print.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/print.c,v
retrieving revision 1.184
diff -u -r1.184 print.c
--- print.c	4 Feb 2003 14:03:13 -0000	1.184
+++ print.c	26 Feb 2003 05:43:26 -0000
@@ -774,9 +774,12 @@
   /* Make Vprin1_to_string_buffer be the default buffer after PRINTFINSH */
   PRINTFINISH;
   set_buffer_internal (XBUFFER (Vprin1_to_string_buffer));
+  if (ZV == ZV_BYTE)
+    Fset_buffer_multibyte (Qnil);
   object = Fbuffer_string ();
 
   Ferase_buffer ();
+  Fset_buffer_multibyte (Qt);
   set_buffer_internal (old);
 
   Vdeactivate_mark = tem;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-26  5:50               ` Stefan Monnier
@ 2003-02-26  7:49                 ` Kenichi Handa
  2003-02-26  8:05                   ` Kenichi Handa
                                     ` (3 more replies)
  2003-02-26 23:26                 ` Richard Stallman
  1 sibling, 4 replies; 27+ messages in thread
From: Kenichi Handa @ 2003-02-26  7:49 UTC (permalink / raw)
  Cc: miles

In article <200302260550.h1Q5oSc08967@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
>>  Why is it not needed?  Strings and buffers are not that
>>  different, both are containers of characters.

> They are used differently.  Operations on strings generally apply to the
> whole string: you can only encode/decode a whole string at a time.

That's because of the limitation of the current
implementation, not because of the nature of strings.
There's no reason for keeping that limitation.  Actually, as
we have changed the type Lisp_String in 21.1, it's not
difficult to make strings change length.

>>  If we get a unibyte string from a unibyte buffer by buffer-substring,
>>  how should we treat that string?

> Like any other unibyte string: as a sequence of raw bytes.
> If you want to treat it as a sequence of characters, then
> you need to pass it through `string-as-multibyte'.

If we regard that limitation as a nature of strings, your
idea is worth considering.  It seems that we can at least
construct a consistent explanation about its behaviour based
on your idea too.

------------------------------------------------------------
What a character in a unibyte buffer represents depends on a
context.  It may be a character represented by a single
byte, or a raw byte not yet decoded, or a byte constituing a
multibyte form of the different character.

On the other hand, a character in a unibyte string always
represents a raw byte.  Emacs coerces it into a character
represented by that single byte when a unibyte string is
concatenated with a multibyte string, or it is inserted in a
multibyte buffer.
------------------------------------------------------------

But, I'm not sure such a change is really necessary.  Are
you sure that the change doesn't break the current usage of
unibyte strings?

>>  The latter yields multibyte, but I think it'a bug.  I found
>>  that "(format "%s" 1)" is implemented by using
>>  prin1-to-string, and prin1-to-string prints an object to a
>>  temporary buffer and gets that buffer string.  So, in a
>>  multibyte sesstion "(format "%s" 1)" yields a multibyte
>>  string.  :-(

> I know: I bumped into it yesterday while playing around with tar-mode.
> How about the attached patch ?

Please see the comments below.

>>  So, do you mean that you want this?
>>  
>>      If a unibyte buffer has \201\300 in the region FROM and TO,
>>  
>>      (encode-coding-string (buffer-substring FROM TO) 'iso-latin-1)
>>  	=> "\201\300"
>>  
>>      (encode-coding-region FROM TO 'iso-latin-1) changes the
>>      region to \300.

> Yes, I guess I'd be happy with it.

>>  Isn't it more confusing?

> Not to me.

What do the other people think about it?

> PS: I wish there was a way to swap two buffers's content so that
>     tar-mode could swap the (potentially very large) data to
>     a helper buffer (without needing to copy this large data)
>     and then use multibyte for the display and unibyte for
>     the helper buffer.

I don't understand what you mean, especially the usage of
the helper buffer.

I think tar-mode should use multiple buffers, one unibyte
buffer for tar-file itself, one multibyte buffer for table
of contents, and the other multibyte buffers (created on
demand) for viewing/editing files contained in the tar-file.
Then, tar mode works almost the same way as dired.  We can
see multibyte files in the different buffers.  We can use
the same method in arc-mode and also in RMAIL.

Is that different from what you mean?

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-26  7:49                 ` Kenichi Handa
@ 2003-02-26  8:05                   ` Kenichi Handa
  2003-02-26  8:08                     ` Stefan Monnier
  2003-02-26  8:12                   ` Stefan Monnier
                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Kenichi Handa @ 2003-02-26  8:05 UTC (permalink / raw)
  Cc: monnier+gnu/emacs

In article <200302260749.QAA29494@etlken.m17n.org>, Kenichi Handa <handa@m17n.org> writes:
>>  I know: I bumped into it yesterday while playing around with tar-mode.
>>  How about the attached patch ?

> Please see the comments below.

Oops, I forgot to write it.

+  Fset_buffer_multibyte (Qt);

Shouldn't it be

+  Fset_buffer_multibyte (Vdefault_enable_multibyte_characters);

?

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-26  8:05                   ` Kenichi Handa
@ 2003-02-26  8:08                     ` Stefan Monnier
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Monnier @ 2003-02-26  8:08 UTC (permalink / raw)
  Cc: monnier+gnu/emacs

> >>  I know: I bumped into it yesterday while playing around with tar-mode.
> >>  How about the attached patch ?
> 
> > Please see the comments below.
> 
> Oops, I forgot to write it.
> 
> +  Fset_buffer_multibyte (Qt);
> 
> Shouldn't it be
> 
> +  Fset_buffer_multibyte (Vdefault_enable_multibyte_characters);

Probably, indeed.


	Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-26  7:49                 ` Kenichi Handa
  2003-02-26  8:05                   ` Kenichi Handa
@ 2003-02-26  8:12                   ` Stefan Monnier
  2003-02-26  8:38                     ` tar-mode Kenichi Handa
  2003-02-26 23:26                   ` setenv -> locale-coding-system cannot handle ASCII?! Richard Stallman
  2003-02-26 23:26                   ` Richard Stallman
  3 siblings, 1 reply; 27+ messages in thread
From: Stefan Monnier @ 2003-02-26  8:12 UTC (permalink / raw)
  Cc: monnier+gnu/emacs

> In article <200302260550.h1Q5oSc08967@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
> >>  Why is it not needed?  Strings and buffers are not that
> >>  different, both are containers of characters.
> 
> > They are used differently.  Operations on strings generally apply to the
> > whole string: you can only encode/decode a whole string at a time.
> 
> That's because of the limitation of the current
> implementation, not because of the nature of strings.

I think it's the nature of strings and of the functions
we provide on them.  If the user wants to do anything else,
she uses a buffer instead where modifications are easy to
make and where you have things like markers, point, ...

> There's no reason for keeping that limitation.  Actually, as
> we have changed the type Lisp_String in 21.1, it's not
> difficult to make strings change length.

Actually, strings are virtually never changed so it would be silly
to do that.  When a string needs to be changed, 99% of the functions
simply create a new string.  I know of only two cases where a string
is mutated: set-text-properties and aset.
The copy of Emacs that I use every day has aset disabled on strings
and works very well despite that (it did require a few minor
changes in a handful of packages).
Emacs strings are 99% immutable.  In practice it's also the case for
Scheme strings, BTW, and there's always been hot debates about whether
or not to change the Scheme language to specify that strings are
not mutable.

> ------------------------------------------------------------
> What a character in a unibyte buffer represents depends on a
> context.  It may be a character represented by a single
> byte, or a raw byte not yet decoded, or a byte constituing a
> multibyte form of the different character.
> 
> On the other hand, a character in a unibyte string always
> represents a raw byte.  Emacs coerces it into a character
> represented by that single byte when a unibyte string is
> concatenated with a multibyte string, or it is inserted in a
> multibyte buffer.
> ------------------------------------------------------------
> 
> But, I'm not sure such a change is really necessary.  Are
> you sure that the change doesn't break the current usage of
> unibyte strings?

I'm pretty sure it'll break current usage in a few places.

> > PS: I wish there was a way to swap two buffers's content so that
> >     tar-mode could swap the (potentially very large) data to
> >     a helper buffer (without needing to copy this large data)
> >     and then use multibyte for the display and unibyte for
> >     the helper buffer.
> 
> I don't understand what you mean, especially the usage of
> the helper buffer.
> 
> I think tar-mode should use multiple buffers, one unibyte
> buffer for tar-file itself, one multibyte buffer for table
> of contents, and the other multibyte buffers (created on
> demand) for viewing/editing files contained in the tar-file.
> Then, tar mode works almost the same way as dired.  We can
> see multibyte files in the different buffers.  We can use
> the same method in arc-mode and also in RMAIL.
> 
> Is that different from what you mean?

No, that's exactly what I meant, but the problem is the following:

When tar-mode is called, the current buffer already contains the 24MB
binary content of the file and it is also the buffer that should
in the end contain the table of contents, so you need to somehow
move those 24MB from this buffer to a new one (the "helper" one).


	Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* tar-mode
  2003-02-26  8:12                   ` Stefan Monnier
@ 2003-02-26  8:38                     ` Kenichi Handa
  2003-02-26  8:53                       ` tar-mode Stefan Monnier
  0 siblings, 1 reply; 27+ messages in thread
From: Kenichi Handa @ 2003-02-26  8:38 UTC (permalink / raw)
  Cc: miles

I changed Subject:.

In article <200302260812.h1Q8CcO10018@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
> When tar-mode is called, the current buffer already contains the 24MB
> binary content of the file and it is also the buffer that should
> in the end contain the table of contents, so you need to somehow
> move those 24MB from this buffer to a new one (the "helper" one).

I still don't understand the necessity of the helper buffer.
When tar-mode is called, usually the current buffer is
unibyte, so there's no need of moving the contents to
another buffer.  Instead, it creates a buffer for table of
contents, setup that buffer, then switch to that buffer.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: tar-mode
  2003-02-26  8:38                     ` tar-mode Kenichi Handa
@ 2003-02-26  8:53                       ` Stefan Monnier
  2003-02-26 11:53                         ` tar-mode Kenichi Handa
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Monnier @ 2003-02-26  8:53 UTC (permalink / raw)
  Cc: emacs-devel

> I still don't understand the necessity of the helper buffer.
> When tar-mode is called, usually the current buffer is
> unibyte, so there's no need of moving the contents to
> another buffer.  Instead, it creates a buffer for table of
> contents, setup that buffer, then switch to that buffer.

How are you going to "switch to that buffer" ?
Can you really expect that none of the callers have done some
kind of save-current-buffer ?

Hmm...looking at the code, it does seem like there really isn't
any save-current-buffer interfering.  Can we rely on that ?
Should we document it ?
It's rather unusual for a major-mode function to switch the
current buffer.


	Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: tar-mode
  2003-02-26  8:53                       ` tar-mode Stefan Monnier
@ 2003-02-26 11:53                         ` Kenichi Handa
  2003-02-26 12:22                           ` tar-mode Stefan Monnier
  0 siblings, 1 reply; 27+ messages in thread
From: Kenichi Handa @ 2003-02-26 11:53 UTC (permalink / raw)
  Cc: emacs-devel

In article <200302260853.h1Q8rRQ10164@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
>>  I still don't understand the necessity of the helper buffer.
>>  When tar-mode is called, usually the current buffer is
>>  unibyte, so there's no need of moving the contents to
>>  another buffer.  Instead, it creates a buffer for table of
>>  contents, setup that buffer, then switch to that buffer.

> How are you going to "switch to that buffer" ?
> Can you really expect that none of the callers have done some
> kind of save-current-buffer ?

> Hmm...looking at the code, it does seem like there really isn't
> any save-current-buffer interfering.  Can we rely on that ?
> Should we document it ?

I have not yet considered this method in deep.

> It's rather unusual for a major-mode function to switch the
> current buffer.

Yes.  I agree that it's a fragile operation.  But, it seems
that it is the only way to avoid moving or copying 24MB
memory.

Ah...  I think I understand what you mean by "swap" and
"helper buffer".  What you are suggesting is something like
this, isn't it?

In the function tar-mode,
...
(setq helper-buffer (generate-new-buffer ...))
(swap-buffer-contents helper-buffer)
;; Now the current buffer is empty.
(tar-mode-setup-table-of-contents helper-buffer)
(add-hook 'write-file-functions
	  'tar-mode-write-helper-buffer)
...

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: tar-mode
  2003-02-26 11:53                         ` tar-mode Kenichi Handa
@ 2003-02-26 12:22                           ` Stefan Monnier
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Monnier @ 2003-02-26 12:22 UTC (permalink / raw)
  Cc: monnier+gnu/emacs

> > It's rather unusual for a major-mode function to switch the
> > current buffer.
> 
> Yes.  I agree that it's a fragile operation.  But, it seems
> that it is the only way to avoid moving or copying 24MB
> memory.
> 
> Ah...  I think I understand what you mean by "swap" and
> "helper buffer".  What you are suggesting is something like
> this, isn't it?

Exactly!


	Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-26  0:58     ` Kenichi Handa
  2003-02-26  2:11       ` Stefan Monnier
@ 2003-02-26 23:25       ` Richard Stallman
  1 sibling, 0 replies; 27+ messages in thread
From: Richard Stallman @ 2003-02-26 23:25 UTC (permalink / raw)
  Cc: miles

    By the way, I still think it's better to have multibyte
    strings in process-environment in a multibyte session.

That is unacceptable because it would make inheritance of
envvars unreliable.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-26  5:32             ` Kenichi Handa
  2003-02-26  5:50               ` Stefan Monnier
@ 2003-02-26 23:26               ` Richard Stallman
  2003-02-27  0:06                 ` Miles Bader
  1 sibling, 1 reply; 27+ messages in thread
From: Richard Stallman @ 2003-02-26 23:26 UTC (permalink / raw)
  Cc: monnier+gnu/emacs

      So, in a
    multibyte sesstion "(format "%s" 1)" yields a multibyte
    string.  :-(

If format sees that all the characters are one-byte, it could make
this a unibyte string.

    By the way, I also really really hate this unibyte/mulitbyte
    problem.  Sometimes I think I should have opposed to the
    introduction of such a concept more strongly.

It is possible that nowadays we do not need unibyte buffers and
strings any more.  Perhaps now the only kind of benefit that
a unibyte buffer provides is a speed advantage for Tar mode
and similar programs.  It would be interesting to find out if
tar mode works right with multibyte buffers, and if so, how
much of a slowdown that entails when reading a large tar file.

But there may be another benefit.  Using unibyte buffers would also
mean that the user will never be asked what encoding to use.  And some
users may really dislike being asked!

We could imagine replacing the current unibyte mode of operation with
one in which visiting files always uses no-conversion but makes
multibyte buffers, and then make sure that works just like using
unibyte buffers does now.  If we get that to work well, then we
could use that instead of unibyte mode.  Then we could reserve
unibyte buffers for things like tar-mode, or use eliminate them
entirely if tar-mode runs fast enough with multibyte buffers.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-26  5:50               ` Stefan Monnier
  2003-02-26  7:49                 ` Kenichi Handa
@ 2003-02-26 23:26                 ` Richard Stallman
  1 sibling, 0 replies; 27+ messages in thread
From: Richard Stallman @ 2003-02-26 23:26 UTC (permalink / raw)
  Cc: miles

    +  if (ZV == ZV_BYTE)
    +    Fset_buffer_multibyte (Qnil);
       object = Fbuffer_string ();

I think it is much more efficient to change the string
rather than the buffer.  And cleaner.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-26  7:49                 ` Kenichi Handa
  2003-02-26  8:05                   ` Kenichi Handa
  2003-02-26  8:12                   ` Stefan Monnier
@ 2003-02-26 23:26                   ` Richard Stallman
  2003-02-26 23:26                   ` Richard Stallman
  3 siblings, 0 replies; 27+ messages in thread
From: Richard Stallman @ 2003-02-26 23:26 UTC (permalink / raw)
  Cc: monnier+gnu/emacs

    What a character in a unibyte buffer represents depends on a
    context.  It may be a character represented by a single
    byte, or a raw byte not yet decoded, or a byte constituing a
    multibyte form of the different character.

    On the other hand, a character in a unibyte string always
    represents a raw byte.  Emacs coerces it into a character
    represented by that single byte when a unibyte string is
    concatenated with a multibyte string, or it is inserted in a
    multibyte buffer.
    ------------------------------------------------------------

    But, I'm not sure such a change is really necessary.  Are
    you sure that the change doesn't break the current usage of
    unibyte strings?

To me this seems like a limitation, not an improvement.  If you cut a
string out of a unibyte buffer, its contents have the same kind of
meaning as the unibyte buffers's contents.

Practically speaking, to implement this would seem to mean
removing features, but which features?

Meanwhile, I wonder whether there is any difference between
handing unibyte text inserted from a unibyte string
and unibyte text inserted from a unibyte buffer with
insert-buffer-substring.  I would expect and hope they
work the same.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-26  7:49                 ` Kenichi Handa
                                     ` (2 preceding siblings ...)
  2003-02-26 23:26                   ` setenv -> locale-coding-system cannot handle ASCII?! Richard Stallman
@ 2003-02-26 23:26                   ` Richard Stallman
  3 siblings, 0 replies; 27+ messages in thread
From: Richard Stallman @ 2003-02-26 23:26 UTC (permalink / raw)
  Cc: monnier+gnu/emacs

    > PS: I wish there was a way to swap two buffers's content so that
    >     tar-mode could swap the (potentially very large) data to
    >     a helper buffer (without needing to copy this large data)

    I think tar-mode should use multiple buffers, one unibyte
    buffer for tar-file itself, one multibyte buffer for table
    of contents, and the other multibyte buffers (created on
    demand) for viewing/editing files contained in the tar-file.

It does use other multibyte buffers (created on demand) for
viewing/editing the contained files.  So the change here would
be in using two buffers for the main file.

You would want the buffer that users know about to contain the ASCII
directory listing.  So the buffer that contains the actual tar file
would want to be another file.  That would work fine except perhaps
for complications in reading in the file.  The feature of
swapping contents between two buffers might indeed be useful here.
And it is probably easy to implement.

    Hmm...looking at the code, it does seem like there really isn't
    any save-current-buffer interfering.  Can we rely on that ?
    Should we document it ?
    It's rather unusual for a major-mode function to switch the
    current buffer.

I think that is a kludge, and that contents-swapping for the
same buffer is much cleaner.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-26 23:26               ` Richard Stallman
@ 2003-02-27  0:06                 ` Miles Bader
  2003-03-03 18:59                   ` Richard Stallman
  0 siblings, 1 reply; 27+ messages in thread
From: Miles Bader @ 2003-02-27  0:06 UTC (permalink / raw)
  Cc: Kenichi Handa

On Wed, Feb 26, 2003 at 06:26:07PM -0500, Richard Stallman wrote:
> It is possible that nowadays we do not need unibyte buffers and
> strings any more.  Perhaps now the only kind of benefit that
> a unibyte buffer provides is a speed advantage for Tar mode
> and similar programs.
> 
> But there may be another benefit.  Using unibyte buffers would also
> mean that the user will never be asked what encoding to use.  And some
> users may really dislike being asked!

The whole problem in my mind (I'm no expert) is the conflation of these two
uses for `unibyte.'  It seems to me that `efficiency' should _never_ be a
reason to use a unibyte buffer, because the emacs primitives should take care
of it automatically -- that is, a buffer/string's should have an associated
`unibyte encoding' attribute, which would allow it to be encoded using the
straightforward and efficient `unibyte representation' but appear to
lisp/whoweve as being a multibyte buffer/string (all of who's characters
happen to have the same charset).  Obviously inserting a character that
didn't match that attribute would be painful (causing the buffer to be
completely converted to a real multibyte repsentation), but well, don't do
that... :-)

-miles

-- 
Fast, small, soon; pick any 2.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-02-27  0:06                 ` Miles Bader
@ 2003-03-03 18:59                   ` Richard Stallman
  2003-03-04  2:48                     ` Miles Bader
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Stallman @ 2003-03-03 18:59 UTC (permalink / raw)
  Cc: handa

      It seems to me that `efficiency' should _never_ be a
    reason to use a unibyte buffer, because the emacs primitives should take care
    of it automatically -- that is, a buffer/string's should have an associated
    `unibyte encoding' attribute, which would allow it to be encoded using the
    straightforward and efficient `unibyte representation' but appear to
    lisp/whoweve as being a multibyte buffer/string (all of who's characters
    happen to have the same charset).

This is more or less what a unibyte buffer is now, except that there
is only one possibility for which character sets can be stored in it:
it holds the character codes from 0 to 0377.

If we wanted to hide from the user the distinction between unibyte and
multibyte buffers, we would have to change the buffer's representation
automatically when inserting characters that don't fit unibyte.  That
seems like a bad idea.

The advantage of unibyte mode for some European Latin-N users is that
they don't have to deal with encoding and decoding, so they never have
to specify a coding system.  It is possible that today we could get
the same results using multibyte buffers and forcing use of a specific
Latin-N coding system.  People could try experimenting with this and
seeing if it provides results that are just like what European users
now get with unibyte mode.

As for the idea that efficiency should never be a factor in deciding
what to do here, I am skeptical of that.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-03-03 18:59                   ` Richard Stallman
@ 2003-03-04  2:48                     ` Miles Bader
  2003-03-04  4:33                       ` Kenichi Handa
  2003-03-05 20:46                       ` Richard Stallman
  0 siblings, 2 replies; 27+ messages in thread
From: Miles Bader @ 2003-03-04  2:48 UTC (permalink / raw)
  Cc: handa

Richard Stallman <rms@gnu.org> writes:
>     a buffer/string's should have an associated `unibyte encoding'
>     attribute, which would allow it to be encoded using the
>     straightforward and efficient `unibyte representation' but appear
>     to lisp/whoweve as being a multibyte buffer/string (all of who's
>     characters happen to have the same charset).
> 
> This is more or less what a unibyte buffer is now, except that there
> is only one possibility for which character sets can be stored in it:
> it holds the character codes from 0 to 0377.

Yeah, but I'm saying that emacs should be able to use this efficient
representation for other character sets as well -- I think it's far more
common to have buffers storing non-raw 8-bit characters than raw
characters, so why is the uncommon case optimized?

> If we wanted to hide from the user the distinction between unibyte and
> multibyte buffers, we would have to change the buffer's representation
> automatically when inserting characters that don't fit unibyte.  That
> seems like a bad idea.

Well I agree that it would be annoying if your 10-megabyte raw-bytes buffer
suddenly got converted because you accidentally inserted a chinese
character. :-)

However I think that in many cases such a conversion would be OK, and
since 99% of the time, people _don't_ mix character sets, it would
probably be a win on average.

Maybe there could be a buffer-local variable that `locks' the buffer's
character set, and would cause an error to be signalled if some code
attempts to insert non-compatible text (instead of converting the
buffer)?  This might better catch errors in coding than current
`just insert the raw-codes' unibyte buffers (if you _really_ want to
insert the raw-codes, you can of course do so explicitly.

> The advantage of unibyte mode for some European Latin-N users is that
> they don't have to deal with encoding and decoding, so they never have
> to specify a coding system.  It is possible that today we could get
> the same results using multibyte buffers and forcing use of a specific
> Latin-N coding system.  People could try experimenting with this and
> seeing if it provides results that are just like what European users
> now get with unibyte mode.

Perhaps the same advantages could be had, without making a special case,
by having a `uninterpreted' character set, which would effectively be
treated by the display code as `just send whatever code raw to the terminal.'

> As for the idea that efficiency should never be a factor in deciding
> what to do here, I am skeptical of that.

I'm not saying that efficiency isn't an issue, I'm saying that lisp
programmers shouldn't have to worry about it as much.  They should be
able to just use `normal' coding methods (which currently means
multibyte by default), and expect that emacs would optimize this in
certain common cases; currently if lisp programmer wants extra
efficiency, he's got to use special and more dangerous operations.

I realize that what I'm suggesting is a bit much, at least for the near
future, but I also think the current design is somewhat broken, and
makes it too easy for programmers to do the wrong thing.

-Miles
-- 
Ich bin ein Virus. Mach' mit und kopiere mich in Deine .signature.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-03-04  2:48                     ` Miles Bader
@ 2003-03-04  4:33                       ` Kenichi Handa
  2003-03-05 20:46                       ` Richard Stallman
  1 sibling, 0 replies; 27+ messages in thread
From: Kenichi Handa @ 2003-03-04  4:33 UTC (permalink / raw)
  Cc: monnier+gnu/emacs

In article <buod6l73kyu.fsf@mcspd15.ucom.lsi.nec.co.jp>, Miles Bader <miles@lsi.nec.co.jp> writes:
> Richard Stallman <rms@gnu.org> writes:
>>      a buffer/string's should have an associated `unibyte encoding'
>>      attribute, which would allow it to be encoded using the
>>      straightforward and efficient `unibyte representation' but appear
>>      to lisp/whoweve as being a multibyte buffer/string (all of who's
>>      characters happen to have the same charset).
>>  
>>  This is more or less what a unibyte buffer is now, except that there
>>  is only one possibility for which character sets can be stored in it:
>>  it holds the character codes from 0 to 0377.

> Yeah, but I'm saying that emacs should be able to use this efficient
> representation for other character sets as well -- I think it's far more
> common to have buffers storing non-raw 8-bit characters than raw
> characters, so why is the uncommon case optimized?

As for memory, such optimization may be worth considering
except for CJK users, but as for speed, not that much.  And
in emacs-unicode, it gets worse.  And, memory is not a big
problem nowadays.

On the other hand, for the operations on raw bytes, the
efficiency of using unibyte buffer/string is really great.

[...]
> but I also think the current design is somewhat broken, and
> makes it too easy for programmers to do the wrong thing.

I agree, and, I think the main reason is the automatic
adjustment of unibyte<->multibyte.  It may be a nifty
feature for users, but a very difficult feature for
programmers (including emacs maintainers).

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: setenv -> locale-coding-system cannot handle ASCII?!
  2003-03-04  2:48                     ` Miles Bader
  2003-03-04  4:33                       ` Kenichi Handa
@ 2003-03-05 20:46                       ` Richard Stallman
  1 sibling, 0 replies; 27+ messages in thread
From: Richard Stallman @ 2003-03-05 20:46 UTC (permalink / raw)
  Cc: handa

    > If we wanted to hide from the user the distinction between unibyte and
    > multibyte buffers, we would have to change the buffer's representation
    > automatically when inserting characters that don't fit unibyte.  That
    > seems like a bad idea.

    Well I agree that it would be annoying if your 10-megabyte raw-bytes buffer
    suddenly got converted because you accidentally inserted a chinese
    character. :-)

    However I think that in many cases such a conversion would be OK, and
    since 99% of the time, people _don't_ mix character sets, it would
    probably be a win on average.

This kind of internally-unibyte buffer would serve only one purpose:
efficiency.  If we decide to eliminate the current user-level feature
of unibyte buffers, then we could implement this efficiency feature
if we decide it is worth the effort.

Whether to eliminate the current user-level feature is a bigger question.
We would want to make sure that we can offer the people who use it
a mode of operation that is just as satisfactory.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2003-03-05 20:46 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-25  0:18 setenv -> locale-coding-system cannot handle ASCII?! Sam Steingold
2003-02-25  6:34 ` Kenichi Handa
2003-02-25  6:47   ` Miles Bader
2003-02-26  0:58     ` Kenichi Handa
2003-02-26  2:11       ` Stefan Monnier
2003-02-26  2:34         ` Kenichi Handa
2003-02-26  2:52           ` Stefan Monnier
2003-02-26  5:32             ` Kenichi Handa
2003-02-26  5:50               ` Stefan Monnier
2003-02-26  7:49                 ` Kenichi Handa
2003-02-26  8:05                   ` Kenichi Handa
2003-02-26  8:08                     ` Stefan Monnier
2003-02-26  8:12                   ` Stefan Monnier
2003-02-26  8:38                     ` tar-mode Kenichi Handa
2003-02-26  8:53                       ` tar-mode Stefan Monnier
2003-02-26 11:53                         ` tar-mode Kenichi Handa
2003-02-26 12:22                           ` tar-mode Stefan Monnier
2003-02-26 23:26                   ` setenv -> locale-coding-system cannot handle ASCII?! Richard Stallman
2003-02-26 23:26                   ` Richard Stallman
2003-02-26 23:26                 ` Richard Stallman
2003-02-26 23:26               ` Richard Stallman
2003-02-27  0:06                 ` Miles Bader
2003-03-03 18:59                   ` Richard Stallman
2003-03-04  2:48                     ` Miles Bader
2003-03-04  4:33                       ` Kenichi Handa
2003-03-05 20:46                       ` Richard Stallman
2003-02-26 23:25       ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).