* Re: Emacs-diffs Digest, Vol 2, Issue 28
[not found] <E18ZDQC-0003mt-02@monty-python.gnu.org>
@ 2003-01-18 0:48 ` Richard Stallman
2003-01-18 12:35 ` Kim F. Storm
` (2 more replies)
0 siblings, 3 replies; 49+ messages in thread
From: Richard Stallman @ 2003-01-18 0:48 UTC (permalink / raw)
! The string argument is normally a multibyte string, except:
! - if the process' input coding system is no-conversion or raw-text,
! it is a unibyte string (the non-converted input), or else
Is this really the right way for it to work?
Should the choice of unibyte or multibyte string
be tied in this way to the choice of coding system?
If you want multibyte strings "without decoding", would emacs-mule
give you that?
Handa and Eli, what do you think?
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28
2003-01-18 0:48 ` Emacs-diffs Digest, Vol 2, Issue 28 Richard Stallman
@ 2003-01-18 12:35 ` Kim F. Storm
2003-01-18 12:40 ` Eli Zaretskii
2003-01-20 1:52 ` Emacs-diffs Digest, Vol 2, Issue 28 Kenichi Handa
2 siblings, 0 replies; 49+ messages in thread
From: Kim F. Storm @ 2003-01-18 12:35 UTC (permalink / raw)
Cc: emacs-devel
Richard Stallman <rms@gnu.org> writes:
> ! The string argument is normally a multibyte string, except:
> ! - if the process' input coding system is no-conversion or raw-text,
> ! it is a unibyte string (the non-converted input), or else
>
> Is this really the right way for it to work?
> Should the choice of unibyte or multibyte string
> be tied in this way to the choice of coding system?
Maybe provide `set-process-multibyte' analogue to `set-buffer-multibyte'.
>
> If you want multibyte strings "without decoding", would emacs-mule
> give you that?
That's what string-as-multibyte is for I think.
--
Kim F. Storm <storm@cua.dk> http://www.cua.dk
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28
2003-01-18 0:48 ` Emacs-diffs Digest, Vol 2, Issue 28 Richard Stallman
2003-01-18 12:35 ` Kim F. Storm
@ 2003-01-18 12:40 ` Eli Zaretskii
2003-01-20 0:49 ` Richard Stallman
2003-01-20 2:29 ` unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] Kenichi Handa
2003-01-20 1:52 ` Emacs-diffs Digest, Vol 2, Issue 28 Kenichi Handa
2 siblings, 2 replies; 49+ messages in thread
From: Eli Zaretskii @ 2003-01-18 12:40 UTC (permalink / raw)
Cc: emacs-devel
> From: Richard Stallman <rms@gnu.org>
> Date: Fri, 17 Jan 2003 19:48:02 -0500
>
> ! The string argument is normally a multibyte string, except:
> ! - if the process' input coding system is no-conversion or raw-text,
> ! it is a unibyte string (the non-converted input), or else
>
> Is this really the right way for it to work?
> Should the choice of unibyte or multibyte string
> be tied in this way to the choice of coding system?
I think users expect this behavior when they use no-conversion or
raw-text.
> If you want multibyte strings "without decoding", would emacs-mule
> give you that?
I don't think so. emacs-mule is for reading text that is already in
the internal Emacs representation, like auto-save files. AFAIK,
raw-text does decode the text in the sense that 8-bit characters which
have their 8th bit set are decoded into the eight-bit-* charsets.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28
2003-01-18 12:40 ` Eli Zaretskii
@ 2003-01-20 0:49 ` Richard Stallman
2003-01-20 2:29 ` unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] Kenichi Handa
1 sibling, 0 replies; 49+ messages in thread
From: Richard Stallman @ 2003-01-20 0:49 UTC (permalink / raw)
Cc: emacs-devel
> Is this really the right way for it to work?
> Should the choice of unibyte or multibyte string
> be tied in this way to the choice of coding system?
Maybe provide `set-process-multibyte' analogue to `set-buffer-multibyte'.
I think that is a good idea.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28
2003-01-18 0:48 ` Emacs-diffs Digest, Vol 2, Issue 28 Richard Stallman
2003-01-18 12:35 ` Kim F. Storm
2003-01-18 12:40 ` Eli Zaretskii
@ 2003-01-20 1:52 ` Kenichi Handa
2003-01-21 18:18 ` Richard Stallman
2003-01-21 18:18 ` Richard Stallman
2 siblings, 2 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-01-20 1:52 UTC (permalink / raw)
Cc: emacs-devel
In article <E18Zh9W-00012L-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
> ! The string argument is normally a multibyte string, except:
> ! - if the process' input coding system is no-conversion or raw-text,
> ! it is a unibyte string (the non-converted input), or else
> Is this really the right way for it to work?
> Should the choice of unibyte or multibyte string
> be tied in this way to the choice of coding system?
This facility was added upon someone's request or to fix
some problem long ago.
1998-12-21 Kenichi Handa <handa@etl.go.jp>
[...]
* process.c (read_process_output): Decide the multibyteness of
string given to a process filter by a coding system used for
decoding the process output.
But, I don't remeber the detail now.
> If you want multibyte strings "without decoding", would emacs-mule
> give you that?
It depends on what kind of multibyte string we want.
If we want the same result as reading a file containing the
same byte sequence by emacs-mule, emacs-mule is fine. This
is the same as reading by no-converson, and insert the given
unibyte string by:
(insert (string-as-multibyte UNIBYTE-STRING)).
If we want a multibyte sequence that is the same as the
result of converting each of the original bytes by
unibyte-char-to-multibyte, we must read by no-conversion,
and insert the given unibyte string just by `insert':
(insert UNIBYTE-STRING)
This is the same as doing:
(insert (string-make-multibyte UNIBYTE-STRING))
If we want a multibyte sequence but each character contained
is one of ascii, eight-bit-control, and eight-bit-graphic
corresponding to the original bytes, we must read by
no-conversion, and insert characters one by one as below:
(apply 'insert (string-to-list UNIBYTE-STRING))
Perhaps, we must have a function, say, string-to-multibyte,
and make this enable.
(insert (string-to-multibyte UNIBYTE-STRING))
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-18 12:40 ` Eli Zaretskii
2003-01-20 0:49 ` Richard Stallman
@ 2003-01-20 2:29 ` Kenichi Handa
2003-01-20 18:48 ` Eli Zaretskii
1 sibling, 1 reply; 49+ messages in thread
From: Kenichi Handa @ 2003-01-20 2:29 UTC (permalink / raw)
Cc: emacs-devel
In article <3405-Sat18Jan2003154003+0200-eliz@is.elta.co.il>, "Eli Zaretskii" <eliz@is.elta.co.il> writes:
>> If you want multibyte strings "without decoding", would emacs-mule
>> give you that?
> I don't think so. emacs-mule is for reading text that is already in
> the internal Emacs representation, like auto-save files.
Yes.
> AFAIK, raw-text does decode the text in the sense that
> 8-bit characters which have their 8th bit set are decoded
> into the eight-bit-* charsets.
Yes, but that is only in the case that you read a file into
a multibyte buffer by raw-text. This conversion from raw
byte sequence to multibyte form is what done by
string-to-multibyte which I wrote in the previous mail.
On process reading, if raw-text is used, the process output
is at first read as a unibyte string, the string is coverted
to multibyte by string-as-mulitbyte (not by not-yet-existing
string-to-multibyte), then inserted in a multibyte buffer.
I don't remember why the current code does as above. I
think the behaviour what Eli wrote is more consistent with
the behaviour of file reading.
Shall I change the code as what Eli wrote (by introducing
the new function string-to-multibyte)?
By the way, it may be clean to have all these functions in
parallel, and spare one section describing the difference of
MAKE, AS, TO conversions in info.
string-make-multibyte
string-as-multibyte
string-to-multibyte
string-make-unibyte
string-as-unibyte
string-to-unibyte (perpaps the same as string-as-unibyte, or
it should signal an error if non-ascii,
non-eight-bit-XXX is contained).
buffer-make-multibyte
buffer-as-multibyte (same as (set-buffer-multibyte BUFFER t))
buffer-to-multibyte
buffer-make-unibyte
buffer-as-unibyte (same as (set-buffer-multibyte BUFFER nil))
buffer-to-nuibyte
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-20 2:29 ` unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] Kenichi Handa
@ 2003-01-20 18:48 ` Eli Zaretskii
2003-01-20 20:55 ` Stefan Monnier
2003-01-21 0:10 ` Kenichi Handa
0 siblings, 2 replies; 49+ messages in thread
From: Eli Zaretskii @ 2003-01-20 18:48 UTC (permalink / raw)
Cc: emacs-devel
> Date: Mon, 20 Jan 2003 11:29:51 +0900 (JST)
> From: Kenichi Handa <handa@m17n.org>
>
> > AFAIK, raw-text does decode the text in the sense that
> > 8-bit characters which have their 8th bit set are decoded
> > into the eight-bit-* charsets.
>
> Yes, but that is only in the case that you read a file into
> a multibyte buffer by raw-text. This conversion from raw
> byte sequence to multibyte form is what done by
> string-to-multibyte which I wrote in the previous mail.
>
> On process reading, if raw-text is used, the process output
> is at first read as a unibyte string, the string is coverted
> to multibyte by string-as-mulitbyte (not by not-yet-existing
> string-to-multibyte), then inserted in a multibyte buffer.
Sorry, I don't think I understand the difference. What will we have
in the buffer after process output is converted as you describe in the
last paragraph above?
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-20 18:48 ` Eli Zaretskii
@ 2003-01-20 20:55 ` Stefan Monnier
2003-01-21 0:20 ` Kenichi Handa
2003-01-22 9:59 ` Richard Stallman
2003-01-21 0:10 ` Kenichi Handa
1 sibling, 2 replies; 49+ messages in thread
From: Stefan Monnier @ 2003-01-20 20:55 UTC (permalink / raw)
Cc: handa
> > Date: Mon, 20 Jan 2003 11:29:51 +0900 (JST)
> > From: Kenichi Handa <handa@m17n.org>
> >
> > > AFAIK, raw-text does decode the text in the sense that
> > > 8-bit characters which have their 8th bit set are decoded
> > > into the eight-bit-* charsets.
> >
> > Yes, but that is only in the case that you read a file into
> > a multibyte buffer by raw-text. This conversion from raw
> > byte sequence to multibyte form is what done by
> > string-to-multibyte which I wrote in the previous mail.
> >
> > On process reading, if raw-text is used, the process output
> > is at first read as a unibyte string, the string is coverted
> > to multibyte by string-as-mulitbyte (not by not-yet-existing
> > string-to-multibyte), then inserted in a multibyte buffer.
>
> Sorry, I don't think I understand the difference. What will we have
> in the buffer after process output is converted as you describe in the
> last paragraph above?
While we're at it, how about making string-as-multibyte obsolete ?
It's not used much and it has been abused many times in the past.
Also, I believe it's more or less equivalent to
(decode-coding-string str 'emacs-mule)
I'd be interested to know what are the differences, if any.
Stefan
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-20 18:48 ` Eli Zaretskii
2003-01-20 20:55 ` Stefan Monnier
@ 2003-01-21 0:10 ` Kenichi Handa
2003-01-21 0:45 ` Stefan Monnier
` (2 more replies)
1 sibling, 3 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-01-21 0:10 UTC (permalink / raw)
Cc: emacs-devel
In article <6480-Mon20Jan2003214849+0200-eliz@is.elta.co.il>, "Eli Zaretskii" <eliz@is.elta.co.il> writes:
>> On process reading, if raw-text is used, the process output
>> is at first read as a unibyte string, the string is coverted
>> to multibyte by string-as-mulitbyte (not by not-yet-existing
>> string-to-multibyte), then inserted in a multibyte buffer.
> Sorry, I don't think I understand the difference. What will we have
> in the buffer after process output is converted as you describe in the
> last paragraph above?
Ok, here's an example (Latin-1 lang. env.).
unibyte sequence (hex): 81 81 C0 C0
result of conversion display in multbyte buffer
string-as-multibyte: 9E A1 81 C0 C0 \201À\300
string-make-multibyte: 9E A1 9E A1 81 C0 81 C0 \201\201ÀÀ
string-to-multibyte: 9E A1 9E A1 C0 C0 \201\201\300\300
(1) Reading a process output by raw-text into a multibyte
buffer does AS conversion. I think this should do TO
conversion to be consistent with (3).
(2) Reading a file by raw-text (resulting in a unibyte
buffer) and copying the contents into a multibyte buffer
does MAKE conversion. This is Emacs' default
unibyte->multibyte conversion.
(3) Inserting a file by raw-text in a multibyte buffer does
TO conversion.
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-20 20:55 ` Stefan Monnier
@ 2003-01-21 0:20 ` Kenichi Handa
2003-01-21 0:54 ` Stefan Monnier
2003-01-21 5:57 ` Eli Zaretskii
2003-01-22 9:59 ` Richard Stallman
1 sibling, 2 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-01-21 0:20 UTC (permalink / raw)
Cc: emacs-devel
In article <200301202055.h0KKtun11691@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
> While we're at it, how about making string-as-multibyte obsolete ?
> It's not used much and it has been abused many times in the past.
It is useful for instance in this scenario. I don't
remember a concrete example, but some package is doing this
kind of thing.
Read a file of weird encoding in a unibyte buffer. Parse
the contents and do decode-coding-region one bunch by one
with different coding systems. Extract some part from that
unibyte buffer and insert it in a mulitbyte buffer. The
last step is:
(let ((str (buffer-substring FROM TO)))
(save-excursion
(set-buffer MULTIBYTE-BUF)
(insert (string-as-multibyte str))))
> Also, I believe it's more or less equivalent to
> (decode-coding-string str 'emacs-mule)
Yes, for the moment. But, in emacs-unicode, we must change
it to:
(decode-coding-string str 'utf-8-emacs)
On the other hand, we don't have to change a code using
string-as-multibyte.
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-21 0:10 ` Kenichi Handa
@ 2003-01-21 0:45 ` Stefan Monnier
2003-01-21 6:01 ` Eli Zaretskii
2003-01-21 8:04 ` Kenichi Handa
2003-01-21 5:56 ` Eli Zaretskii
2003-01-22 10:00 ` Richard Stallman
2 siblings, 2 replies; 49+ messages in thread
From: Stefan Monnier @ 2003-01-21 0:45 UTC (permalink / raw)
Cc: emacs-devel
> In article <6480-Mon20Jan2003214849+0200-eliz@is.elta.co.il>, "Eli Zaretskii" <eliz@is.elta.co.il> writes:
> >> On process reading, if raw-text is used, the process output
> >> is at first read as a unibyte string, the string is coverted
> >> to multibyte by string-as-mulitbyte (not by not-yet-existing
> >> string-to-multibyte), then inserted in a multibyte buffer.
>
> > Sorry, I don't think I understand the difference. What will we have
> > in the buffer after process output is converted as you describe in the
> > last paragraph above?
>
> Ok, here's an example (Latin-1 lang. env.).
>
> unibyte sequence (hex): 81 81 C0 C0
> result of conversion display in multbyte buffer
> string-as-multibyte: 9E A1 81 C0 C0 \201À\300
> string-make-multibyte: 9E A1 9E A1 81 C0 81 C0 \201\201ÀÀ
> string-to-multibyte: 9E A1 9E A1 C0 C0 \201\201\300\300
I find the terminology and the concepts confusing.
On the other hand, I understand the concept of encoding and decoding.
The following equivalences almost hold:
(string-as-multibyte str) == (decode-coding-string str 'internal)
(string-make-multibyte str) == (decode-coding-string str 'default)
(string-to-multibyte str) == (decode-coding-string str 'raw-text)
I said "almost" because:
1 - there is no `internal' coding-system as of now. In Emacs-21 we'd
use `emacs-mule' but for Emacs-22 it would be `utf-8-emacs'.
I'm still not sure what such a thing is useful for, tho (see
my other email).
2 - there is no `default' coding-system either. Or maybe
locale-coding-system is this default: if your locale is
latin-1 then that's latin-1. For non-8-bit locales,
I don't know what string-make-multibyte does.
3 - when called with a `raw-text' coding-system, decode-coding-string
returns a unibyte string, which is obviously not what we want here.
It might make sense for internal operations to return unibyte
strings for the `raw-text' case, but I was really surprised that
decode-coding-string would ever return a unibyte string.
I think avoiding string-FOO-multibyte and using decode-coding-string
instead would make things a lot more clear.
-- Stefan
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-21 0:20 ` Kenichi Handa
@ 2003-01-21 0:54 ` Stefan Monnier
2003-01-21 5:57 ` Eli Zaretskii
1 sibling, 0 replies; 49+ messages in thread
From: Stefan Monnier @ 2003-01-21 0:54 UTC (permalink / raw)
Cc: monnier+gnu/emacs
> > While we're at it, how about making string-as-multibyte obsolete ?
> > It's not used much and it has been abused many times in the past.
>
> It is useful for instance in this scenario. I don't
> remember a concrete example, but some package is doing this
> kind of thing.
>
> Read a file of weird encoding in a unibyte buffer. Parse
> the contents and do decode-coding-region one bunch by one
> with different coding systems. Extract some part from that
> unibyte buffer and insert it in a mulitbyte buffer. The
> last step is:
> (let ((str (buffer-substring FROM TO)))
> (save-excursion
> (set-buffer MULTIBYTE-BUF)
> (insert (string-as-multibyte str))))
Without a concrete example I don't find this scenario compelling.
E.g. I fail to see why it should do
decode-coding-region + buffer-substring + string-as-multibyte
rather than
buffer-substring + decode-coding-string
But even if the scenario is possible,
using (decode-coding-region str 'emacs-mule) doesn't seem much worse
than (string-as-multibyte str), so I'm not sure how relevant it is
to whether or not we should obsolete string-as-multibyte.
> > Also, I believe it's more or less equivalent to
> > (decode-coding-string str 'emacs-mule)
>
> Yes, for the moment. But, in emacs-unicode, we must change
> it to:
> (decode-coding-string str 'utf-8-emacs)
> On the other hand, we don't have to change a code using
> string-as-multibyte.
So I hereby suggest (define-coding-system-alias 'internal 'emacs-mule)
which we can happily change in Emacs-22 to `utf-8-emacs', such that
we will still have
(string-as-multibyte str) == (decode-coding-string str 'internal)
-- Stefan
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-21 0:10 ` Kenichi Handa
2003-01-21 0:45 ` Stefan Monnier
@ 2003-01-21 5:56 ` Eli Zaretskii
2003-01-21 6:38 ` Kenichi Handa
2003-01-22 10:00 ` Richard Stallman
2 siblings, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2003-01-21 5:56 UTC (permalink / raw)
Cc: emacs-devel
On Tue, 21 Jan 2003, Kenichi Handa wrote:
> > Sorry, I don't think I understand the difference. What will we have
> > in the buffer after process output is converted as you describe in the
> > last paragraph above?
>
> Ok, here's an example (Latin-1 lang. env.).
Thanks for taking time to explain these subtleties.
> (1) Reading a process output by raw-text into a multibyte
> buffer does AS conversion. I think this should do TO
> conversion to be consistent with (3).
I tend to agree, but without knowing the exact reason for the current
behavior, I fear that we will break something.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-21 0:20 ` Kenichi Handa
2003-01-21 0:54 ` Stefan Monnier
@ 2003-01-21 5:57 ` Eli Zaretskii
1 sibling, 0 replies; 49+ messages in thread
From: Eli Zaretskii @ 2003-01-21 5:57 UTC (permalink / raw)
Cc: emacs-devel
On Tue, 21 Jan 2003, Kenichi Handa wrote:
> In article <200301202055.h0KKtun11691@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
> > While we're at it, how about making string-as-multibyte obsolete ?
> > It's not used much and it has been abused many times in the past.
>
> It is useful for instance in this scenario. I don't
> remember a concrete example, but some package is doing this
> kind of thing.
I think it might be Gnus.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-21 0:45 ` Stefan Monnier
@ 2003-01-21 6:01 ` Eli Zaretskii
2003-01-21 6:43 ` Kenichi Handa
2003-01-21 8:04 ` Kenichi Handa
1 sibling, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2003-01-21 6:01 UTC (permalink / raw)
Cc: emacs-devel
On Mon, 20 Jan 2003, Stefan Monnier wrote:
> > unibyte sequence (hex): 81 81 C0 C0
> > result of conversion display in multbyte buffer
> > string-as-multibyte: 9E A1 81 C0 C0 \201À\300
> > string-make-multibyte: 9E A1 9E A1 81 C0 81 C0 \201\201ÀÀ
> > string-to-multibyte: 9E A1 9E A1 C0 C0 \201\201\300\300
> [...]
> 3 - when called with a `raw-text' coding-system, decode-coding-string
> returns a unibyte string
I might be missing something, but I think you are wrong: the sequence
"9E A1 9E A1 C0 C0" is _not_ a unibyte string. For example, "9E A1" is
the multibyte encoding of the 81 byte.
> I think avoiding string-FOO-multibyte and using decode-coding-string
> instead would make things a lot more clear.
FWIW, I never use string-*-multibyte because I could never remember what
exactly does each variant do.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-21 5:56 ` Eli Zaretskii
@ 2003-01-21 6:38 ` Kenichi Handa
0 siblings, 0 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-01-21 6:38 UTC (permalink / raw)
Cc: emacs-devel
In article <Pine.SUN.3.91.1030121075455.9650B-100000@is>, Eli Zaretskii <eliz@is.elta.co.il> writes:
> On Tue, 21 Jan 2003, Kenichi Handa wrote:
>> > Sorry, I don't think I understand the difference. What will we have
>> > in the buffer after process output is converted as you describe in the
>> > last paragraph above?
>>
>> Ok, here's an example (Latin-1 lang. env.).
> Thanks for taking time to explain these subtleties.
>> (1) Reading a process output by raw-text into a multibyte
>> buffer does AS conversion. I think this should do TO
>> conversion to be consistent with (3).
> I tend to agree, but without knowing the exact reason for the current
> behavior, I fear that we will break something.
I don't remeber well now. :-(
Perhpas, because there was a choise only between
string-as-multibyte and string-make-multibyte, and I thought
string-as-multibyte is less surprising than
string-make-multibyte. That is because the reason for
specifying raw-text or no-conversion for process should be
that one doesn't want code conversion. If we use
string-make-multibyte, the result is almost the same as
decoding by, for instance, iso-latin-1.
If we had string-to-multibyte at that time, I might have
used it.
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-21 6:01 ` Eli Zaretskii
@ 2003-01-21 6:43 ` Kenichi Handa
0 siblings, 0 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-01-21 6:43 UTC (permalink / raw)
Cc: monnier+gnu/emacs
In article <Pine.SUN.3.91.1030121075724.9650D-100000@is>, Eli Zaretskii <eliz@is.elta.co.il> writes:
> On Mon, 20 Jan 2003, Stefan Monnier wrote:
>> > unibyte sequence (hex): 81 81 C0 C0
>> > result of conversion display in multbyte buffer
>> > string-as-multibyte: 9E A1 81 C0 C0 \201À\300
>> > string-make-multibyte: 9E A1 9E A1 81 C0 81 C0 \201\201ÀÀ
>> > string-to-multibyte: 9E A1 9E A1 C0 C0 \201\201\300\300
>> [...]
>> 3 - when called with a `raw-text' coding-system, decode-coding-string
>> returns a unibyte string
> I might be missing something, but I think you are wrong: the sequence
> "9E A1 9E A1 C0 C0" is _not_ a unibyte string.
I didn't wrote that is a unibyte string. In the above
example, only the first line is the unibyte string. The
remaining lines shows the result of unibyte->multibyte
conversion, thus they are multibyte strings.
> For example, "9E A1" is the multibyte encoding of the 81
> byte.
Yes.
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-21 0:45 ` Stefan Monnier
2003-01-21 6:01 ` Eli Zaretskii
@ 2003-01-21 8:04 ` Kenichi Handa
2003-01-21 15:02 ` Miles Bader
` (2 more replies)
1 sibling, 3 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-01-21 8:04 UTC (permalink / raw)
Cc: emacs-devel
In article <200301210045.h0L0jS812745@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
>> unibyte sequence (hex): 81 81 C0 C0
>> result of conversion display in multbyte buffer
>> string-as-multibyte: 9E A1 81 C0 C0 \201À\300
>> string-make-multibyte: 9E A1 9E A1 81 C0 81 C0 \201\201ÀÀ
>> string-to-multibyte: 9E A1 9E A1 C0 C0 \201\201\300\300
> I find the terminology and the concepts confusing.
I agree that those names are not that intuitive, but the
first two were there before I noticed it. :-p
But, in what sense, the concepts are confusing?
> On the other hand, I understand the concept of encoding and decoding.
> The following equivalences almost hold:
> (string-as-multibyte str) == (decode-coding-string str 'internal)
> (string-make-multibyte str) == (decode-coding-string str 'default)
> (string-to-multibyte str) == (decode-coding-string str 'raw-text)
> I said "almost" because:
Please note that decode-coding-string also does eol
conversion. Using 'internal-unix, 'default-unix,
'raw-text-unix will make them more equivalent.
> 1 - there is no `internal' coding-system as of now. In Emacs-21 we'd
> use `emacs-mule' but for Emacs-22 it would be `utf-8-emacs'.
> I'm still not sure what such a thing is useful for, tho (see
> my other email).
Before we introduced eight-bit-XXXX,
(insert (string-as-multibyte UNIBYTE-STRING))
was the only way to preserve the original byte sequence in a
multibyte buffer.
But, as we now have eight-bit-XXXX, I agree that
string-as-multibyte is not that useful, string-to-multibyte
is better.
> 2 - there is no `default' coding-system either. Or maybe
> locale-coding-system is this default: if your locale is
> latin-1 then that's latin-1.
If one does not do set-language-enviroment,
locale-coding-system can be used as `default'.
> For non-8-bit locales, I don't know what
> string-make-multibyte does.
In that case, it does latin-1 decoding, ... yes, not that good.
> 3 - when called with a `raw-text' coding-system, decode-coding-string
> returns a unibyte string, which is obviously not what we want here.
> It might make sense for internal operations to return unibyte
> strings for the `raw-text' case, but I was really surprised that
> decode-coding-string would ever return a unibyte string.
I tend to agree that it is better that decode-coding-string
always return a multibyte string now.
> I think avoiding string-FOO-multibyte and using decode-coding-string
> instead would make things a lot more clear.
I think string-FOO-multibyte (and also string-FOO-unibyte)
are conceptually different from decoding (and encoding)
operations. It's difficult for me to explain it clearly,
but I'll try.
Decoding and encoding are interface between Emacs and the
outer world.
Decoding is for converting an external byte sequence
(i.e. belonging to a world out of Emacs) into Emacs'
represenatation.
Encoding is for converting Emacs' represenatation to a byte
sequence that is used out of Emacs.
But string-FOO-multi/unibyte are convesion within Emacs'
world.
And, if one wants to insert a result of encode-coding-string
in a multibyte buffer (perhaps for some post-processing),
what he should do? If we have string-to-multibyte, we can
do this:
(insert (string-to-multibyte
(encode-coding-string MULTIBYTE-STRING CODING)))
If we don't have it, and provided that decode-coding-string
always returns a multibyte string, we must do:
(insert (decode-coding-string
(encode-coding-string MULTIBYTE-STRING CODING) 'raw-text-unix))
Isn't it very funny?
By the way, I think the culprit of the current problem is
this Emacs' doctrine:
Do unibyte<->mutibyte conversion by "MAKE" by default.
Although this doctrine surely works for handling unibyte and
multibyte represenation transparently, it makes Elisp
programmers very very confused. And it is useful only for
people whose main charset is single-byte.
I seriously considering changing it in emacs-unicode.
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-21 8:04 ` Kenichi Handa
@ 2003-01-21 15:02 ` Miles Bader
2003-01-21 17:44 ` Stefan Monnier
2003-01-22 10:00 ` Richard Stallman
2 siblings, 0 replies; 49+ messages in thread
From: Miles Bader @ 2003-01-21 15:02 UTC (permalink / raw)
Cc: monnier+gnu/emacs
On Tue, Jan 21, 2003 at 05:04:37PM +0900, Kenichi Handa wrote:
> And, if one wants to insert a result of encode-coding-string
> in a multibyte buffer (perhaps for some post-processing),
> what he should do? If we have string-to-multibyte, we can
> do this:
> (insert (string-to-multibyte
> (encode-coding-string MULTIBYTE-STRING CODING)))
> If we don't have it, and provided that decode-coding-string
> always returns a multibyte string, we must do:
> (insert (decode-coding-string
> (encode-coding-string MULTIBYTE-STRING CODING) 'raw-text-unix))
> Isn't it very funny?
Actually I find the second variant much _more_ clear, as it makes it very
obvious what's happening.
-Miles
--
`The suburb is an obsolete and contradictory form of human settlement'
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-21 8:04 ` Kenichi Handa
2003-01-21 15:02 ` Miles Bader
@ 2003-01-21 17:44 ` Stefan Monnier
2003-01-22 10:00 ` Richard Stallman
2 siblings, 0 replies; 49+ messages in thread
From: Stefan Monnier @ 2003-01-21 17:44 UTC (permalink / raw)
Cc: emacs-devel
> In article <200301210045.h0L0jS812745@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
> >> unibyte sequence (hex): 81 81 C0 C0
> >> result of conversion display in multbyte buffer
> >> string-as-multibyte: 9E A1 81 C0 C0 \201À\300
> >> string-make-multibyte: 9E A1 9E A1 81 C0 81 C0 \201\201ÀÀ
> >> string-to-multibyte: 9E A1 9E A1 C0 C0 \201\201\300\300
>
> > I find the terminology and the concepts confusing.
>
> I agree that those names are not that intuitive, but the
> first two were there before I noticed it. :-p
> But, in what sense, the concepts are confusing?
The concept of string-as-multibyte made some sense in Emacs-20
when it was really "look under the hood: take the same bytes but
interpret them differently". In Emacs-21, this is not the case
any more so I don't really understand what's the intent behind it
other than emacs-mule decoding (that it might happen to come out of
some other decoding step rather than out of a file is not really
relevant, I think).
I think what I find confusing is that the name of those functions
implicitly says "take the string and give me the same one, but
just multibyte instead of unibyte", even though there's no unambiguous
way to have "the same one". So there has to be a choice of how
the conversion between unibyte and multibyte takes place, but this choice
is not clearly described by the functions's name.
> Please note that decode-coding-string also does eol
> conversion. Using 'internal-unix, 'default-unix,
Sorry for my sloppyness.
> 'raw-text-unix will make them more equivalent.
This should probably be `no-conversion' (or `binary'). Admittedly, it's
the same, but I think it carries the intent a bit better.
> But, as we now have eight-bit-XXXX, I agree that
> string-as-multibyte is not that useful, string-to-multibyte
> is better.
But they do different things and the name-difference does not
explain clearly the subtle distinction, so I think it's more
confusing than anything else.
> > 2 - there is no `default' coding-system either. Or maybe
> > locale-coding-system is this default: if your locale is
> > latin-1 then that's latin-1.
>
> If one does not do set-language-enviroment,
> locale-coding-system can be used as `default'.
And otherwise ? The mere fact that I don't know the answer to this
question seems like a good indication that pretty much nobody knows what
`string-make-multibyte' does, so anyone who uses it is most likely
using it wrong.
Luckily, it seems only ps-mule.el uses it (although much more
code uses the underlying nonascii-translation-table functionality).
> > 3 - when called with a `raw-text' coding-system, decode-coding-string
> > returns a unibyte string, which is obviously not what we want here.
> > It might make sense for internal operations to return unibyte
> > strings for the `raw-text' case, but I was really surprised that
> > decode-coding-string would ever return a unibyte string.
>
> I tend to agree that it is better that decode-coding-string
> always return a multibyte string now.
If it can be fixed, we can recommend (decode-coding-string str 'no-conversion)
rather than introducing a new function string-to-multibyte.
> I think string-FOO-multibyte (and also string-FOO-unibyte)
> are conceptually different from decoding (and encoding)
> operations. It's difficult for me to explain it clearly,
> but I'll try.
>
> Decoding and encoding are interface between Emacs and the
> outer world.
>
> Decoding is for converting an external byte sequence
> (i.e. belonging to a world out of Emacs) into Emacs'
> representation.
>
> Encoding is for converting Emacs' represenatation to a byte
> sequence that is used out of Emacs.
But the `emacs-mule' coding-system is used both inside and outside,
and same goes for `binary', so the distinction between inside and
outside is not very clear-cut.
I find it more helpful to think in terms of bytes and chars: unibyte
strings are sequences of bytes while multibyte strings are sequences
of chars. Converting between bytes and chars is the purpose of
coding-systems. In such a context, string-FOO-multibyte are obviously
just various forms of decoding, but the names don't give a good sense
of which decoding is used.
> And, if one wants to insert a result of encode-coding-string
> in a multibyte buffer (perhaps for some post-processing),
> what he should do? If we have string-to-multibyte, we can
> do this:
> (insert (string-to-multibyte
> (encode-coding-string MULTIBYTE-STRING CODING)))
> If we don't have it, and provided that decode-coding-string
> always returns a multibyte string, we must do:
> (insert (decode-coding-string
> (encode-coding-string MULTIBYTE-STRING CODING) 'raw-text-unix))
> Isn't it very funny?
Obviously, I agree with Miles, that the second is much more clear (especially
if you replace `raw-text-unix' with `no-conversion'. well, I prefer `binary'
myself, since the `no-conversion' is also a misnomer given that a conversion
does take place).
> By the way, I think the culprit of the current problem is
> this Emacs' doctrine:
> Do unibyte<->mutibyte conversion by "MAKE" by default.
Since MAKE uses some kind of "default" related to the current language
environment, I think it's OK, except that it's not clear in what way
it's "related".
But of course, there should simply never be such a thing as "guess
what this unibyte stream translates into". The coding-system used to
decode unibyte into multibyte should always be "clearly" defined
(by the process's coding-system, the keyboard's coding-system, ...).
I.e. it is simply a bug to insert a unibyte string into a multibyte buffer
(and vice versa).
As for inserting a char between 128 and 256 into a multibyte buffer...
it should ideally always be treated as an eight-bit-foo char,
but I think that making such a change right now would not be wise
because there is still too much code which forgets to decode
its bytes into chars (an instead relies on the MAKE default to
turn those chars into latin-1 chars).
> Although this doctrine surely works for handling unibyte and
> multibyte represenation transparently, it makes Elisp
> programmers very very confused. And it is useful only for
> people whose main charset is single-byte.
>
> I seriously considering changing it in emacs-unicode.
Might be a good idea for emacs-unicode indeed.
Stefan
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28
2003-01-20 1:52 ` Emacs-diffs Digest, Vol 2, Issue 28 Kenichi Handa
@ 2003-01-21 18:18 ` Richard Stallman
2003-01-28 0:32 ` Kenichi Handa
2003-01-21 18:18 ` Richard Stallman
1 sibling, 1 reply; 49+ messages in thread
From: Richard Stallman @ 2003-01-21 18:18 UTC (permalink / raw)
Cc: emacs-devel
[...]
* process.c (read_process_output): Decide the multibyteness of
string given to a process filter by a coding system used for
decoding the process output.
But, I don't remeber the detail now.
set-process-multibyte should be a good solution.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28
2003-01-20 1:52 ` Emacs-diffs Digest, Vol 2, Issue 28 Kenichi Handa
2003-01-21 18:18 ` Richard Stallman
@ 2003-01-21 18:18 ` Richard Stallman
2003-01-27 12:20 ` Kenichi Handa
1 sibling, 1 reply; 49+ messages in thread
From: Richard Stallman @ 2003-01-21 18:18 UTC (permalink / raw)
Cc: emacs-devel
If we want a multibyte sequence but each character contained
is one of ascii, eight-bit-control, and eight-bit-graphic
corresponding to the original bytes, we must read by
no-conversion, and insert characters one by one as below:
(apply 'insert (string-to-list UNIBYTE-STRING))
Perhaps, we must have a function, say, string-to-multibyte,
and make this enable.
(insert (string-to-multibyte UNIBYTE-STRING))
I think we do need this function string-to-multibyte, and its doc
string should carefully explain how it differs from
string-make-multibyte.
I don't remember why the current code does as above. I
think the behaviour what Eli wrote is more consistent with
the behaviour of file reading.
Shall I change the code as what Eli wrote (by introducing
the new function string-to-multibyte)?
Please do introduce string-to-multibyte. What other change do
you propose? I am not sure how what Eli wrote differs from the
current behavior.
By the way, it may be clean to have all these functions in
parallel, and spare one section describing the difference of
MAKE, AS, TO conversions in info.
string-make-multibyte
string-as-multibyte
string-to-multibyte
These three are all useful.
string-make-unibyte
string-as-unibyte
string-to-unibyte (perpaps the same as string-as-unibyte, or
it should signal an error if non-ascii,
non-eight-bit-XXX is contained).
I don't see a need to add string-to-unibyte.
buffer-make-multibyte
buffer-as-multibyte (same as (set-buffer-multibyte BUFFER t))
buffer-to-multibyte
I don't think buffer-make-multibyte and buffer-to-multibyte are
useful. What is useful is to have functions to operate on a region in
a multibyte buffer, transforming the text between these three
different representations. (Some of the 6 transformations may be
meaningless or impossible; we should only support the meaningful
ones.)
buffer-make-unibyte
buffer-as-unibyte (same as (set-buffer-multibyte BUFFER nil))
buffer-to-nuibyte
I don't think any change here is worth making.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-20 20:55 ` Stefan Monnier
2003-01-21 0:20 ` Kenichi Handa
@ 2003-01-22 9:59 ` Richard Stallman
2003-01-22 14:12 ` Stefan Monnier
1 sibling, 1 reply; 49+ messages in thread
From: Richard Stallman @ 2003-01-22 9:59 UTC (permalink / raw)
Cc: handa
While we're at it, how about making string-as-multibyte obsolete ?
It is not obsolete--there are reasons to use it.
I think avoiding string-FOO-multibyte and using decode-coding-string
instead would make things a lot more clear.
I don't see any advantage in the change.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-21 0:10 ` Kenichi Handa
2003-01-21 0:45 ` Stefan Monnier
2003-01-21 5:56 ` Eli Zaretskii
@ 2003-01-22 10:00 ` Richard Stallman
2003-01-22 14:12 ` Stefan Monnier
2 siblings, 1 reply; 49+ messages in thread
From: Richard Stallman @ 2003-01-22 10:00 UTC (permalink / raw)
Cc: emacs-devel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 623 bytes --]
unibyte sequence (hex): 81 81 C0 C0
result of conversion display in multbyte buffer
string-as-multibyte: 9E A1 81 C0 C0 \201À\300
string-make-multibyte: 9E A1 9E A1 81 C0 81 C0 \201\201ÀÀ
string-to-multibyte: 9E A1 9E A1 C0 C0 \201\201\300\300
I think this example should go in the Lisp manual.
Could you add it?
(1) Reading a process output by raw-text into a multibyte
buffer does AS conversion. I think this should do TO
conversion to be consistent with (3).
That is a strong argument in favor. Does anyone see any arguments
against this change?
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-21 8:04 ` Kenichi Handa
2003-01-21 15:02 ` Miles Bader
2003-01-21 17:44 ` Stefan Monnier
@ 2003-01-22 10:00 ` Richard Stallman
2 siblings, 0 replies; 49+ messages in thread
From: Richard Stallman @ 2003-01-22 10:00 UTC (permalink / raw)
Cc: monnier+gnu/emacs
By the way, I think the culprit of the current problem is
this Emacs' doctrine:
Do unibyte<->mutibyte conversion by "MAKE" by default.
Although this doctrine surely works for handling unibyte and
multibyte represenation transparently, it makes Elisp
programmers very very confused.
It is absolutely crucial to make unibyte and multibyte operation
interoperate smoothly for Latin character sets. We did this
so that people who used Emacs for European character sets
would not have trouble.
It is more important to make things easy for non-programmers
than to make it easy for programmers. So don't change this.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-22 10:00 ` Richard Stallman
@ 2003-01-22 14:12 ` Stefan Monnier
0 siblings, 0 replies; 49+ messages in thread
From: Stefan Monnier @ 2003-01-22 14:12 UTC (permalink / raw)
Cc: Kenichi Handa
> (1) Reading a process output by raw-text into a multibyte
> buffer does AS conversion. I think this should do TO
> conversion to be consistent with (3).
>
> That is a strong argument in favor.
100% agreement: AS corresponds to the emacs-mule coding-system, whereas
raw-text should be like TO.
Stefan
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-22 9:59 ` Richard Stallman
@ 2003-01-22 14:12 ` Stefan Monnier
2003-01-22 18:09 ` Eli Zaretskii
2003-01-24 5:43 ` Richard Stallman
0 siblings, 2 replies; 49+ messages in thread
From: Stefan Monnier @ 2003-01-22 14:12 UTC (permalink / raw)
Cc: Stefan Monnier
> While we're at it, how about making string-as-multibyte obsolete ?
>
> It is not obsolete--there are reasons to use it.
But it can be replaced by a call to decode-coding-string, so it is
not indispensable.
> I think avoiding string-FOO-multibyte and using decode-coding-string
> instead would make things a lot more clear.
>
> I don't see any advantage in the change.
Here is the reason why we should discourage the use of unibyte<->multibyte
conversions and recommend coding/decoding instead:
There is a lot of
confusion among Emacs hackers about "what's this MULE stuff" and "why
Emacs does conversions instead of keeping things as they are", typically
for users of latin-1 locales (but more generally any 8-bit locale)
where they don't understand the difference between bytes and chars.
This is of course why we introduced unibyte buffers in the first place:
a lot of code was not properly updated to MULE and was not doing
conversions where they're necessary.
So where does the unibyte<->multibyte stuff comes in ? I think it
simply promotes the illusion that it is possible to "switch between
the two equivalent representation" although there's clearly no unambiguous
equivalence. So people end up with "oh, I have a unibyte thing here
and Emacs wants a multibyte thing instead, so I'll just make it
multibyte" using some kind of default encoding which "should work
most of the time".
If coders such as Eli and myself don't fully understand the semantics
of string-as-multibyte and string-make-multibyte (and the various ways
in which they are implicitly called), it's clear that those functions
should basically not be used by anyone.
Using decode-coding-string is just as easy and makes things much
more clear so we should encourage it.
Stefan
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-22 14:12 ` Stefan Monnier
@ 2003-01-22 18:09 ` Eli Zaretskii
2003-01-23 11:38 ` Richard Stallman
2003-01-24 5:43 ` Richard Stallman
1 sibling, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2003-01-22 18:09 UTC (permalink / raw)
Cc: emacs-devel
> From: "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu>
> Date: Wed, 22 Jan 2003 09:12:49 -0500
>
> If coders such as Eli and myself don't fully understand the semantics
> of string-as-multibyte and string-make-multibyte (and the various ways
> in which they are implicitly called), it's clear that those functions
> should basically not be used by anyone.
That is my opinion as well, FWIW. The string conversion functions
_are_ useful, but they employ lots of ad-hoc decisions to cope with
different, sometimes conflicting, goals. I've heard Handa-san many
times explaining what each one of the function does, but I find
myself forgetting that too soon, and need to consult the code and
conduct experiments every time I bump into them.
By constrast, encode-coding-* and decode-coding-* are very
straightforward.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-22 18:09 ` Eli Zaretskii
@ 2003-01-23 11:38 ` Richard Stallman
2003-01-23 16:18 ` Stefan Monnier
2003-01-23 17:48 ` Eli Zaretskii
0 siblings, 2 replies; 49+ messages in thread
From: Richard Stallman @ 2003-01-23 11:38 UTC (permalink / raw)
Cc: monnier+gnu/emacs
By constrast, encode-coding-* and decode-coding-* are very
straightforward.
These functions, per se, are straightforward--but would the difference
between the various proposed coding systems be equally
straightforward?
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-23 11:38 ` Richard Stallman
@ 2003-01-23 16:18 ` Stefan Monnier
2003-01-24 17:16 ` Richard Stallman
2003-01-23 17:48 ` Eli Zaretskii
1 sibling, 1 reply; 49+ messages in thread
From: Stefan Monnier @ 2003-01-23 16:18 UTC (permalink / raw)
Cc: monnier+gnu/emacs
> By constrast, encode-coding-* and decode-coding-* are very
> straightforward.
> These functions, per se, are straightforward--but would the difference
> between the various proposed coding systems be equally
> straightforward?
I think people understand the difference between binary, emacs-mule, and
locale-coding-system. Probably not everybody does, but those who don't,
*really* can't understand string-FOO-multibyte either and should
thus stay away from it.
Stefan
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-23 11:38 ` Richard Stallman
2003-01-23 16:18 ` Stefan Monnier
@ 2003-01-23 17:48 ` Eli Zaretskii
1 sibling, 0 replies; 49+ messages in thread
From: Eli Zaretskii @ 2003-01-23 17:48 UTC (permalink / raw)
Cc: emacs-devel
> From: Richard Stallman <rms@gnu.org>
> Date: Thu, 23 Jan 2003 06:38:17 -0500
>
> These functions, per se, are straightforward--but would the difference
> between the various proposed coding systems be equally
> straightforward?
I'm not sure what are you asking. If you are concerned that people
would not grasp the effects of encoding/decoding text with ``obscure''
coding systems like no-conversion and raw-text, then I agree with
Stefan: these conversions are well-defined and can be understood upon
careful reading. (If the current docs doesn't do a good job
explaining those coding systems, we could improve that.)
The important point, to me, is that using en/decode-coding-*, you know
_exactly_ what will happen, since you specify the encoding. The
string-*-uni/multibyte functions, by contrast, make complicated
decisions about the encoding, so you need to memorize those decisions
to use the functions in a predictable manner. I find myself unable to
remember that; perhaps it's just me and my failing memory.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-22 14:12 ` Stefan Monnier
2003-01-22 18:09 ` Eli Zaretskii
@ 2003-01-24 5:43 ` Richard Stallman
2003-01-26 1:30 ` Stefan Monnier
1 sibling, 1 reply; 49+ messages in thread
From: Richard Stallman @ 2003-01-24 5:43 UTC (permalink / raw)
Cc: monnier+gnu/emacs
Using decode-coding-string is just as easy and makes things much
more clear so we should encourage it.
I don't see how it makes anything clearer. It would tend to make the
documentation less clear. Right now there are two (perhaps in the
future three) functions, each of which has a doc string saying what it
does and what it's good for. Where would that info go if we make the
change you recommend?
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-23 16:18 ` Stefan Monnier
@ 2003-01-24 17:16 ` Richard Stallman
0 siblings, 0 replies; 49+ messages in thread
From: Richard Stallman @ 2003-01-24 17:16 UTC (permalink / raw)
Cc: monnier+gnu/emacs
I think people understand the difference between binary, emacs-mule, and
locale-coding-system. Probably not everybody does, but those who don't,
*really* can't understand string-FOO-multibyte either and should
thus stay away from it.
I am not convinced this is true.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-24 5:43 ` Richard Stallman
@ 2003-01-26 1:30 ` Stefan Monnier
2003-01-27 2:31 ` Richard Stallman
2003-01-27 7:38 ` Kenichi Handa
0 siblings, 2 replies; 49+ messages in thread
From: Stefan Monnier @ 2003-01-26 1:30 UTC (permalink / raw)
Cc: Stefan Monnier
> Using decode-coding-string is just as easy and makes things much
> more clear so we should encourage it.
>
> I don't see how it makes anything clearer. It would tend to make the
> documentation less clear. Right now there are two (perhaps in the
> future three) functions, each of which has a doc string saying what it
> does and what it's good for. Where would that info go if we make the
> change you recommend?
The change I suggest is to obsolete those functions and to recommend
decode-coding-string instead, which has a perfectly good docstring
itself and so do each and every coding-system that you might want to
pass to that function.
I don't understand your question. When people use string-FOO-multibyte
it's generally because they don't understand what's going on and they
think "a char is a char is a char and I don't get this multibyte madness":
using decode-coding-string would force them to better understand what's
going on.
Stefan
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-26 1:30 ` Stefan Monnier
@ 2003-01-27 2:31 ` Richard Stallman
2003-01-27 7:38 ` Kenichi Handa
1 sibling, 0 replies; 49+ messages in thread
From: Richard Stallman @ 2003-01-27 2:31 UTC (permalink / raw)
Cc: monnier+gnu/emacs
The change I suggest is to obsolete those functions and to recommend
decode-coding-string instead, which has a perfectly good docstring
itself and so do each and every coding-system that you might want to
pass to that function.
I've decided not to make this change. I tried to explain why; if it
isn't clear, I don't know how to explain better. I am sorry I cannot
communicate it better.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-26 1:30 ` Stefan Monnier
2003-01-27 2:31 ` Richard Stallman
@ 2003-01-27 7:38 ` Kenichi Handa
2003-01-27 14:12 ` Stefan Monnier
1 sibling, 1 reply; 49+ messages in thread
From: Kenichi Handa @ 2003-01-27 7:38 UTC (permalink / raw)
Cc: monnier+gnu/emacs
In article <200301260130.h0Q1Uo518101@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
> I don't understand your question. When people use string-FOO-multibyte
> it's generally because they don't understand what's going on and they
> think "a char is a char is a char and I don't get this multibyte madness":
> using decode-coding-string would force them to better understand what's
> going on.
But I suspect that such people won't use the correct coding
system anyway. To use the correct coding system, they must
clearly understand what kind of multibyte string they want.
And if they understand that, there should be no difficulty
in using the correct string-FOO-multibyte function.
In one sense, it seems clean to use the concept of decoding
and encoding for all unibyte<->multibyte conversions
coherently. But, that hides what Emacs actually does.
You wrote:
> I find it more helpful to think in terms of bytes and chars:
Definitely. But,
> unibyte strings are sequences of bytes while multibyte
> strings are sequences of chars.
Unfortunately no.
Emacs can represent a character sequence both in unibyte and
multibyte string. Emacs can also represent a raw-byte
sequence both in unibyte and multibyte string. For a
multibyte string, which it represents (char-seq or byte-seq)
can be detected by what kind of characters it contains.
But, for a unibyte string, it's impossible, only the context
of how it is used decides that.
For string-make-multibyte, the input is a char-seq, and the
resulf of conversion is also a char-seq. So, the concept of
decoding is not applicable here.
For string-to-multibyte, the input is a byte-seq, and the
result of conversion is also a byte-seq. So, again, the
concept of decoding is not applicable neither.
For string-as-multibyte, the intput is a byte-seq, and the
result of conversion is a char-seq. So, only here, the
concept of decoding is also applicable.
I hope this explains why I insist on string-FOO-multibyte
functions.
By the way, it may be good to instroduce coding system
aliases `internal' and `default', and write, for instance,
in the docstring of string-as-multibyte that the effect is
the same as (decode-coding-string UNIBYTE-STRING 'internal).
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28
2003-01-21 18:18 ` Richard Stallman
@ 2003-01-27 12:20 ` Kenichi Handa
2003-01-29 0:05 ` Richard Stallman
0 siblings, 1 reply; 49+ messages in thread
From: Kenichi Handa @ 2003-01-27 12:20 UTC (permalink / raw)
Cc: emacs-devel
In article <E18b2yg-0007yl-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
> string-make-multibyte
> string-as-multibyte
> string-to-multibyte
> These three are all useful.
> string-make-unibyte
> string-as-unibyte
> string-to-unibyte (perpaps the same as string-as-unibyte, or
> it should signal an error if non-ascii,
> non-eight-bit-XXX is contained).
> I don't see a need to add string-to-unibyte.
We have string-as/make-multibyte and string-as/make-unibyte.
If one finds string-to-multibyte, it's quite natural that he
also expects string-to-unibyte. Even if it is an alias of
string-as-unibyte, I think it's worth having it. And, it's
simpler to have it than saying that we don't have
string-to-unibyte because ... in some place.
And I think it's better that it signals an error as written
above.
> buffer-make-multibyte
> buffer-as-multibyte (same as (set-buffer-multibyte BUFFER t))
> buffer-to-multibyte
> I don't think buffer-make-multibyte and buffer-to-multibyte are
> useful. What is useful is to have functions to operate on a region in
> a multibyte buffer, transforming the text between these three
> different representations. (Some of the 6 transformations may be
> meaningless or impossible; we should only support the meaningful
> ones.)
I don't agree with having such function. I think such a
case is where we have to use decode/encode-coding-region.
Eight-bit chars in a multibyte buffer actually represent
raw-bytes. Then the operation of turing them to characters
is "decoding", not transforming.
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-27 7:38 ` Kenichi Handa
@ 2003-01-27 14:12 ` Stefan Monnier
2003-01-29 11:23 ` Kenichi Handa
0 siblings, 1 reply; 49+ messages in thread
From: Stefan Monnier @ 2003-01-27 14:12 UTC (permalink / raw)
Cc: monnier+gnu/emacs
> In one sense, it seems clean to use the concept of decoding
> and encoding for all unibyte<->multibyte conversions
> coherently. But, that hides what Emacs actually does.
You mean that string-FOO-multibyte uses special-cased code
and that there is thus a difference of efficiency ?
> > unibyte strings are sequences of bytes while multibyte
> > strings are sequences of chars.
> Unfortunately no.
I don't think there is any "truth" here. There are simply different
ways to look at the same thing.
Stefan
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28
2003-01-21 18:18 ` Richard Stallman
@ 2003-01-28 0:32 ` Kenichi Handa
2003-01-28 12:35 ` Kim F. Storm
2003-03-03 18:59 ` Emacs-diffs Digest, Vol 2, Issue 28 Richard Stallman
0 siblings, 2 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-01-28 0:32 UTC (permalink / raw)
Cc: emacs-devel
In article <E18b2yf-0007yS-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
> [...]
> * process.c (read_process_output): Decide the multibyteness of
> string given to a process filter by a coding system used for
> decoding the process output.
> But, I don't remeber the detail now.
> set-process-multibyte should be a good solution.
I think we have extrapolated the behaviour of process I/O
from that of file I/O. But set-process-multibyte breaks it.
It's something like declaring the multibyteness of a process
but we don't declare that of a file. So it's not easy to
extrapolate the exact semantics of set-process-multibyte.
It is fairly clear for the case that a process has a fileter
and coding system is no-conversion or raw-text. But what to
do with the other coding systems (e.g. Latin-1,
iso-2022-jp). If a process is unibyte, what kind of string,
the filter should get? Should we suppress a text decoding
like the case of inserting a file into a unibyte buffer?
And if the process has no filter but has a buffer, what to
do for a unibyte process that has a multibyte buffer, or for
a multibyte process that has a unibyte buffer?
And what to do with process-send-string/region.
I think it is better to keep extrapolating the behaviour of
process reading/sending from file reading/writing.
For inserting a proecss output in a buffer, there's no
difficulty to extrapolate the behaviour.
For a filter, although we don't have a function something
like string-from-file, the most resembling code will be
this.
(with-temp-buffer
(insert-file-contents FILE)
(buffer-string))
A string given to a process filter must be the same as the
result of that code, which means that
default-enable-multibyte-characters decides the
multibyteness, and if it is nil, character conversion except
for end-of-line conversion is suppressed.
The only question is when to check
default-enable-multibyte-characters. When a process is
created, or just before calling a filter? I think the
former is more like file I/O. And it may be ok to have a
function set-process-filter-multibyte which can change the
multibyteness of a string to a filter on the way.
Or, was the intention of set-process-multibyte actually
set-process-filter-multibyte?
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28
2003-01-28 0:32 ` Kenichi Handa
@ 2003-01-28 12:35 ` Kim F. Storm
2003-02-10 8:15 ` set-process-filter-multibyte and etc Kenichi Handa
2003-03-03 18:59 ` Emacs-diffs Digest, Vol 2, Issue 28 Richard Stallman
1 sibling, 1 reply; 49+ messages in thread
From: Kim F. Storm @ 2003-01-28 12:35 UTC (permalink / raw)
Cc: emacs-devel
Kenichi Handa <handa@m17n.org> writes:
> I think it is better to keep extrapolating the behaviour of
> process reading/sending from file reading/writing.
I agree.
> For a filter, although we don't have a function something
> like string-from-file, the most resembling code will be
> this.
>
> (with-temp-buffer
> (insert-file-contents FILE)
> (buffer-string))
>
> A string given to a process filter must be the same as the
> result of that code, which means that
> default-enable-multibyte-characters decides the
> multibyteness, and if it is nil, character conversion except
> for end-of-line conversion is suppressed.
>
> The only question is when to check
> default-enable-multibyte-characters. When a process is
> created, or just before calling a filter? I think the
> former is more like file I/O. And it may be ok to have a
> function set-process-filter-multibyte which can change the
> multibyteness of a string to a filter on the way.
Good points. Also, there could be a new `:multibyte BOOL' argument to
make-network-process to initialize the filter multibyteness of the new
process; specifying this would override the setting of
default-enable-multibyte-characters.
>
> Or, was the intention of set-process-multibyte actually
> set-process-filter-multibyte?
At least, that was the problem I was looking at when I suggested it,
so yes.
--
Kim F. Storm <storm@cua.dk> http://www.cua.dk
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28
2003-01-27 12:20 ` Kenichi Handa
@ 2003-01-29 0:05 ` Richard Stallman
0 siblings, 0 replies; 49+ messages in thread
From: Richard Stallman @ 2003-01-29 0:05 UTC (permalink / raw)
Cc: emacs-devel
> I don't see a need to add string-to-unibyte.
We have string-as/make-multibyte and string-as/make-unibyte.
If one finds string-to-multibyte, it's quite natural that he
also expects string-to-unibyte. Even if it is an alias of
string-as-unibyte, I think it's worth having it. And, it's
simpler to have it than saying that we don't have
string-to-unibyte because ... in some place.
I don't think we should add functions merely for completeness.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28]
2003-01-27 14:12 ` Stefan Monnier
@ 2003-01-29 11:23 ` Kenichi Handa
0 siblings, 0 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-01-29 11:23 UTC (permalink / raw)
Cc: monnier+gnu/emacs
In article <200301271412.h0REClJ30624@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
>> In one sense, it seems clean to use the concept of decoding
>> and encoding for all unibyte<->multibyte conversions
>> coherently. But, that hides what Emacs actually does.
> You mean that string-FOO-multibyte uses special-cased code
> and that there is thus a difference of efficiency ?
Yes. string-FOO-multibyte are more effcient than
decode-coding-string. But, that is not the point.
>> > unibyte strings are sequences of bytes while multibyte
>> > strings are sequences of chars.
>> Unfortunately no.
> I don't think there is any "truth" here. There are simply different
> ways to look at the same thing.
I don't understand why you don't think my explanation is not
true.
You wrote:
>> Converting between bytes and chars is the purpose of
>> coding-systems.
Ok, then resulting region of encode-coding-region is a
sequence of bytes, not chars, even if it's a multibyte
buffer. Thus, the return string of buffer-substring on that
region (let's name it MULTI) is also a byte sequence.
Using (string-to-unibyte MULTI) to get the same byte
sequence but in unibyte form is ok as long as we adopt my
interpretatoin of that function.
But, doing (encode-coding-string MULTI 'raw-text) is
conceptually broken because MULTI is already a byte
sequence.
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* set-process-filter-multibyte and etc.
2003-01-28 12:35 ` Kim F. Storm
@ 2003-02-10 8:15 ` Kenichi Handa
2003-02-10 14:57 ` Kim F. Storm
2003-02-20 1:27 ` Tak Ota
0 siblings, 2 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-02-10 8:15 UTC (permalink / raw)
Cc: emacs-devel
In article <5xsmvdbgf7.fsf@kfs2.cua.dk>, storm@cua.dk (Kim F. Storm) writes:
>> The only question is when to check
>> default-enable-multibyte-characters. When a process is
>> created, or just before calling a filter? I think the
>> former is more like file I/O. And it may be ok to have a
>> function set-process-filter-multibyte which can change the
>> multibyteness of a string to a filter on the way.
> Good points. Also, there could be a new `:multibyte BOOL' argument to
> make-network-process to initialize the filter multibyteness of the new
> process; specifying this would override the setting of
> default-enable-multibyte-characters.
>> Or, was the intention of set-process-multibyte actually
>> set-process-filter-multibyte?
> At least, that was the problem I was looking at when I suggested it,
> so yes.
I've just installed changes for set-process-filter-multibyte
and etc. I added the followings to etc/NEWS. Could people
please fix my English.
---
Ken'ichi HANDA
handa@m17n.org
** New function `set-process-filter-multibyte' sets the multibyteness
of a string given to a process's filter.
** New function `process-filter-multibyte-p' returns t if
a string given to a process's filter is multibyte.
** A filter function of a process is called with a multibyte string if
the filter's multibyteness is t. That multibyteness is decided by the
value of `default-enable-multibyte-characters' when the process is
created and can be changed later by `set-process-filter-multibyte'.
** If a process's coding system is raw-text or no-conversion and its
buffer is multibyte, the output of the process is at first converted
to multibyte by `string-to-multibyte' then inserted in the buffer.
Previously, it was converted to multibyte by `string-as-multibyte',
which was not compatible with the behaviour of file reading.
** New function `string-to-multibyte' converts a unibyte string to a
multibyte string with the same individual character codes.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: set-process-filter-multibyte and etc.
2003-02-10 8:15 ` set-process-filter-multibyte and etc Kenichi Handa
@ 2003-02-10 14:57 ` Kim F. Storm
2003-02-11 0:15 ` Kenichi Handa
2003-02-20 1:27 ` Tak Ota
1 sibling, 1 reply; 49+ messages in thread
From: Kim F. Storm @ 2003-02-10 14:57 UTC (permalink / raw)
Cc: emacs-devel
Kenichi Handa <handa@m17n.org> writes:
> In article <5xsmvdbgf7.fsf@kfs2.cua.dk>, storm@cua.dk (Kim F. Storm) writes:
> >> The only question is when to check
> >> default-enable-multibyte-characters. When a process is
> >> created, or just before calling a filter? I think the
> >> former is more like file I/O. And it may be ok to have a
> >> function set-process-filter-multibyte which can change the
> >> multibyteness of a string to a filter on the way.
>
> > Good points. Also, there could be a new `:multibyte BOOL' argument to
> > make-network-process to initialize the filter multibyteness of the new
> > process; specifying this would override the setting of
> > default-enable-multibyte-characters.
>
> >> Or, was the intention of set-process-multibyte actually
> >> set-process-filter-multibyte?
>
> > At least, that was the problem I was looking at when I suggested it,
> > so yes.
>
> I've just installed changes for set-process-filter-multibyte
> and etc.
Great. I've fixed a few typos and doc strings.
If a process buffer is specified to start-process or make-network-process,
it would seem logical to assume that the process filter will
eventually insert the string into that buffer.
So I wonder whether it would make sense to let the default filter
multibyteness depend on the multibyteness of the BUFFER argument to
start-process and make-network-process (if specified and non-nil)?
And only revert to default-enable-multibyte-characters if BUFFER is nil.
--
Kim F. Storm <storm@cua.dk> http://www.cua.dk
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: set-process-filter-multibyte and etc.
2003-02-10 14:57 ` Kim F. Storm
@ 2003-02-11 0:15 ` Kenichi Handa
0 siblings, 0 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-02-11 0:15 UTC (permalink / raw)
Cc: emacs-devel
In article <5xwuk8che8.fsf@kfs2.cua.dk>, storm@cua.dk (Kim F. Storm) writes:
> Kenichi Handa <handa@m17n.org> writes:
>> In article <5xsmvdbgf7.fsf@kfs2.cua.dk>, storm@cua.dk (Kim F. Storm) writes:
>> >> The only question is when to check
>> >> default-enable-multibyte-characters. When a process is
>> >> created, or just before calling a filter? I think the
>> >> former is more like file I/O. And it may be ok to have a
>> >> function set-process-filter-multibyte which can change the
>> >> multibyteness of a string to a filter on the way.
>>
>> > Good points. Also, there could be a new `:multibyte BOOL' argument to
>> > make-network-process to initialize the filter multibyteness of the new
>> > process; specifying this would override the setting of
>> > default-enable-multibyte-characters.
>>
>> >> Or, was the intention of set-process-multibyte actually
>> >> set-process-filter-multibyte?
>>
>> > At least, that was the problem I was looking at when I suggested it,
>> > so yes.
>>
>> I've just installed changes for set-process-filter-multibyte
>> and etc.
> Great. I've fixed a few typos and doc strings.
Thank you.
> If a process buffer is specified to start-process or make-network-process,
> it would seem logical to assume that the process filter will
> eventually insert the string into that buffer.
> So I wonder whether it would make sense to let the default filter
> multibyteness depend on the multibyteness of the BUFFER argument to
> start-process and make-network-process (if specified and non-nil)?
> And only revert to default-enable-multibyte-characters if BUFFER is nil.
That sounds reasonable. If there's no objection, I'll
change the code.
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: set-process-filter-multibyte and etc.
2003-02-10 8:15 ` set-process-filter-multibyte and etc Kenichi Handa
2003-02-10 14:57 ` Kim F. Storm
@ 2003-02-20 1:27 ` Tak Ota
2003-02-20 1:56 ` Kenichi Handa
1 sibling, 1 reply; 49+ messages in thread
From: Tak Ota @ 2003-02-20 1:27 UTC (permalink / raw)
Cc: emacs-devel
2003-02-10 Kenichi Handa <handa@m17n.org>
* process.c (QCfilter_multibyte): New variable.
(setup_process_coding_systems): New function.
setup_process_coding_systems should check the validity of inch and
outch since they can be (-1) when the buffer is killed and
setup_process_coding_systems is called downstream of exec_sentinel.
Here is the actual case I encountered.
EMACS! memset + 65 bytes
setup_coding_system(int 0x114f893c, coding_system * 0x11377064) line 3401 + 22 bytes
setup_process_coding_systems(int 0x41d41780) line 605 + 23 bytes
Fset_process_buffer(int 0x41d41780, int 0x11342404) line 855 + 9 bytes
Ffuncall(int 0x00000003, int * 0x0082f448) line 2744 + 25 bytes
Fbyte_code(int 0x31bd5834, int 0x41beb580, int 0x00000004) line 709 + 16 bytes
funcall_lambda(int 0x41b7a440, int 0x00000002, int * 0x0082f684) line 2929 + 43 bytes
Ffuncall(int 0x00000003, int * 0x0082f680) line 2788 + 20 bytes
Fapply(int 0x00000002, int * 0x0082f6cc) line 2247 + 13 bytes
apply1(int 0x11b30894, int 0x51c8d95c) line 2500 + 11 bytes
read_process_output_call() line 4374 + 29 bytes
internal_condition_case_1(int (void)* 0x01050bfd read_process_output_call(void), int 0x51c8d954, int 0x1135c214, int (void)* 0x01052894 exec_sentinel_error_handler(void)) line 1392 + 7 bytes
exec_sentinel() line 5971 + 98 bytes
status_notify() line 6068 + 13 bytes
wait_reading_process_input(int 0x0000001e, int 0x00000000, int 0x0fffffff, int 0x00000001) line 3939
sit_for(int 0x0000001e, int 0x00000000, int 0x00000001, int 0x00000001, int 0x00000000) line 6249 + 21 bytes
read_char(int 0x00000001, int 0x00000003, int * 0x0082fa7c, int 0x11342404, int * 0x0082fbdc) line 2681 + 36 bytes
read_key_sequence(int * 0x0082fd40, int 0x0000001e, int 0x11342404, int 0x00000000, int 0x00000001, int 0x00000001) line 8566 + 41 bytes
command_loop_1() line 1492 + 27 bytes
internal_condition_case(int (void)* 0x0101409f command_loop_1(void), int 0x1135c214, int (void)* 0x01013c71 cmd_error(void)) line 1351 + 3 bytes
command_loop_2() line 1290 + 21 bytes
internal_catch(int 0x113517c4, int (void)* 0x01013f67 command_loop_2(void), int 0x11342404) line 1112 + 7 bytes
command_loop() line 1269 + 23 bytes
recursive_edit_1() line 985 + 5 bytes
Frecursive_edit() line 1042
main() line 1661
EMACS! mainCRTStartup + 180 bytes
_start() line 136
KERNEL32! 77ea847c()
-Tak
Mon, 10 Feb 2003 17:15:16 +0900 (JST): Kenichi Handa <handa@m17n.org> wrote:
> In article <5xsmvdbgf7.fsf@kfs2.cua.dk>, storm@cua.dk (Kim F. Storm) writes:
> >> The only question is when to check
> >> default-enable-multibyte-characters. When a process is
> >> created, or just before calling a filter? I think the
> >> former is more like file I/O. And it may be ok to have a
> >> function set-process-filter-multibyte which can change the
> >> multibyteness of a string to a filter on the way.
>
> > Good points. Also, there could be a new `:multibyte BOOL' argument to
> > make-network-process to initialize the filter multibyteness of the new
> > process; specifying this would override the setting of
> > default-enable-multibyte-characters.
>
> >> Or, was the intention of set-process-multibyte actually
> >> set-process-filter-multibyte?
>
> > At least, that was the problem I was looking at when I suggested it,
> > so yes.
>
> I've just installed changes for set-process-filter-multibyte
> and etc. I added the followings to etc/NEWS. Could people
> please fix my English.
>
> ---
> Ken'ichi HANDA
> handa@m17n.org
>
> ** New function `set-process-filter-multibyte' sets the multibyteness
> of a string given to a process's filter.
>
> ** New function `process-filter-multibyte-p' returns t if
> a string given to a process's filter is multibyte.
>
> ** A filter function of a process is called with a multibyte string if
> the filter's multibyteness is t. That multibyteness is decided by the
> value of `default-enable-multibyte-characters' when the process is
> created and can be changed later by `set-process-filter-multibyte'.
>
> ** If a process's coding system is raw-text or no-conversion and its
> buffer is multibyte, the output of the process is at first converted
> to multibyte by `string-to-multibyte' then inserted in the buffer.
> Previously, it was converted to multibyte by `string-as-multibyte',
> which was not compatible with the behaviour of file reading.
>
> ** New function `string-to-multibyte' converts a unibyte string to a
> multibyte string with the same individual character codes.
>
>
>
> _______________________________________________
> Emacs-devel mailing list
> Emacs-devel@gnu.org
> http://mail.gnu.org/mailman/listinfo/emacs-devel
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: set-process-filter-multibyte and etc.
2003-02-20 1:27 ` Tak Ota
@ 2003-02-20 1:56 ` Kenichi Handa
2003-02-20 2:44 ` Tak Ota
0 siblings, 1 reply; 49+ messages in thread
From: Kenichi Handa @ 2003-02-20 1:56 UTC (permalink / raw)
Cc: emacs-devel
In article <20030219.172710.60851987.Takaaki.Ota@am.sony.com>, Tak Ota <Takaaki.Ota@am.sony.com> writes:
> setup_process_coding_systems should check the validity of inch and
> outch since they can be (-1) when the buffer is killed and
> setup_process_coding_systems is called downstream of exec_sentinel.
Thank you for the bug report. I've just installed the
attached change. Does it fix the problem?
---
Ken'ichi HANDA
handa@m17n.org
2003-02-20 Kenichi Handa <handa@m17n.org>
* process.c (setup_process_coding_systems): If the process's
in/out descriptor is -1, do nothing.
Index: process.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/process.c,v
retrieving revision 1.399
retrieving revision 1.400
diff -u -c -r1.399 -r1.400
cvs server: conflicting specifications of output style
*** process.c 10 Feb 2003 13:51:43 -0000 1.399
--- process.c 20 Feb 2003 01:54:09 -0000 1.400
***************
*** 598,603 ****
--- 598,606 ----
int inch = XINT (p->infd);
int outch = XINT (p->outfd);
+ if (inch < 0 || outch < 0)
+ return;
+
if (!proc_decode_coding_system[inch])
proc_decode_coding_system[inch]
= (struct coding_system *) xmalloc (sizeof (struct coding_system));
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: set-process-filter-multibyte and etc.
2003-02-20 1:56 ` Kenichi Handa
@ 2003-02-20 2:44 ` Tak Ota
0 siblings, 0 replies; 49+ messages in thread
From: Tak Ota @ 2003-02-20 2:44 UTC (permalink / raw)
Cc: emacs-devel
Thu, 20 Feb 2003 10:56:04 +0900 (JST): Kenichi Handa <handa@m17n.org> wrote:
> In article <20030219.172710.60851987.Takaaki.Ota@am.sony.com>, Tak Ota <Takaaki.Ota@am.sony.com> writes:
> > setup_process_coding_systems should check the validity of inch and
> > outch since they can be (-1) when the buffer is killed and
> > setup_process_coding_systems is called downstream of exec_sentinel.
>
> Thank you for the bug report. I've just installed the
> attached change. Does it fix the problem?
Yes it does. Thank you.
-Tak
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28
2003-01-28 0:32 ` Kenichi Handa
2003-01-28 12:35 ` Kim F. Storm
@ 2003-03-03 18:59 ` Richard Stallman
1 sibling, 0 replies; 49+ messages in thread
From: Richard Stallman @ 2003-03-03 18:59 UTC (permalink / raw)
Cc: emacs-devel
It is fairly clear for the case that a process has a fileter
and coding system is no-conversion or raw-text. But what to
do with the other coding systems (e.g. Latin-1,
iso-2022-jp). If a process is unibyte, what kind of string,
the filter should get? Should we suppress a text decoding
like the case of inserting a file into a unibyte buffer?
I don't see any other meaningful thing to do in that case. Decoding
produces, in general, data that cannot go in a unibyte string.
And if the process has no filter but has a buffer, what to
do for a unibyte process that has a multibyte buffer, or for
a multibyte process that has a unibyte buffer?
These are unreasonable cases, so there is no need to strain to hard to
make them do useful things. For the unibyte process with a multibyte
buffer, just turn off decoding, and let the data get converted to
multibyte when inserted in the buffer. For the multibyte process with
unibyte buffer, it can convert the multibyte decoded text to unibyte.
If people feel that isn't useful, they shouldn't use it.
And what to do with process-send-string/region.
If the process is multibyte, encode the text (after converting it
first to multibyte if necessary). If the process is unibyte, just
convert the text to unibyte.
A string given to a process filter must be the same as the
result of that code, which means that
default-enable-multibyte-characters decides the
multibyteness, and if it is nil, character conversion except
for end-of-line conversion is suppressed.
default-enable-multibyte-characters should not directly control
the multibyteness of the string. The process's multibyte flag
should control that.
But it is a good idea to let default-enable-multibyte-characters
play a role in initializing the process's multibyte flag.
Or, was the intention of set-process-multibyte actually
set-process-filter-multibyte?
You could think of it that way, because the case where
set-process-multibyte is useful is when the process has a filter.
^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2003-03-03 18:59 UTC | newest]
Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E18ZDQC-0003mt-02@monty-python.gnu.org>
2003-01-18 0:48 ` Emacs-diffs Digest, Vol 2, Issue 28 Richard Stallman
2003-01-18 12:35 ` Kim F. Storm
2003-01-18 12:40 ` Eli Zaretskii
2003-01-20 0:49 ` Richard Stallman
2003-01-20 2:29 ` unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] Kenichi Handa
2003-01-20 18:48 ` Eli Zaretskii
2003-01-20 20:55 ` Stefan Monnier
2003-01-21 0:20 ` Kenichi Handa
2003-01-21 0:54 ` Stefan Monnier
2003-01-21 5:57 ` Eli Zaretskii
2003-01-22 9:59 ` Richard Stallman
2003-01-22 14:12 ` Stefan Monnier
2003-01-22 18:09 ` Eli Zaretskii
2003-01-23 11:38 ` Richard Stallman
2003-01-23 16:18 ` Stefan Monnier
2003-01-24 17:16 ` Richard Stallman
2003-01-23 17:48 ` Eli Zaretskii
2003-01-24 5:43 ` Richard Stallman
2003-01-26 1:30 ` Stefan Monnier
2003-01-27 2:31 ` Richard Stallman
2003-01-27 7:38 ` Kenichi Handa
2003-01-27 14:12 ` Stefan Monnier
2003-01-29 11:23 ` Kenichi Handa
2003-01-21 0:10 ` Kenichi Handa
2003-01-21 0:45 ` Stefan Monnier
2003-01-21 6:01 ` Eli Zaretskii
2003-01-21 6:43 ` Kenichi Handa
2003-01-21 8:04 ` Kenichi Handa
2003-01-21 15:02 ` Miles Bader
2003-01-21 17:44 ` Stefan Monnier
2003-01-22 10:00 ` Richard Stallman
2003-01-21 5:56 ` Eli Zaretskii
2003-01-21 6:38 ` Kenichi Handa
2003-01-22 10:00 ` Richard Stallman
2003-01-22 14:12 ` Stefan Monnier
2003-01-20 1:52 ` Emacs-diffs Digest, Vol 2, Issue 28 Kenichi Handa
2003-01-21 18:18 ` Richard Stallman
2003-01-28 0:32 ` Kenichi Handa
2003-01-28 12:35 ` Kim F. Storm
2003-02-10 8:15 ` set-process-filter-multibyte and etc Kenichi Handa
2003-02-10 14:57 ` Kim F. Storm
2003-02-11 0:15 ` Kenichi Handa
2003-02-20 1:27 ` Tak Ota
2003-02-20 1:56 ` Kenichi Handa
2003-02-20 2:44 ` Tak Ota
2003-03-03 18:59 ` Emacs-diffs Digest, Vol 2, Issue 28 Richard Stallman
2003-01-21 18:18 ` Richard Stallman
2003-01-27 12:20 ` Kenichi Handa
2003-01-29 0:05 ` Richard Stallman
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).