* Re: Emacs-diffs Digest, Vol 2, Issue 28 [not found] <E18ZDQC-0003mt-02@monty-python.gnu.org> @ 2003-01-18 0:48 ` Richard Stallman 2003-01-18 12:35 ` Kim F. Storm ` (2 more replies) 0 siblings, 3 replies; 49+ messages in thread From: Richard Stallman @ 2003-01-18 0:48 UTC (permalink / raw) ! The string argument is normally a multibyte string, except: ! - if the process' input coding system is no-conversion or raw-text, ! it is a unibyte string (the non-converted input), or else Is this really the right way for it to work? Should the choice of unibyte or multibyte string be tied in this way to the choice of coding system? If you want multibyte strings "without decoding", would emacs-mule give you that? Handa and Eli, what do you think? ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28 2003-01-18 0:48 ` Emacs-diffs Digest, Vol 2, Issue 28 Richard Stallman @ 2003-01-18 12:35 ` Kim F. Storm 2003-01-18 12:40 ` Eli Zaretskii 2003-01-20 1:52 ` Emacs-diffs Digest, Vol 2, Issue 28 Kenichi Handa 2 siblings, 0 replies; 49+ messages in thread From: Kim F. Storm @ 2003-01-18 12:35 UTC (permalink / raw) Cc: emacs-devel Richard Stallman <rms@gnu.org> writes: > ! The string argument is normally a multibyte string, except: > ! - if the process' input coding system is no-conversion or raw-text, > ! it is a unibyte string (the non-converted input), or else > > Is this really the right way for it to work? > Should the choice of unibyte or multibyte string > be tied in this way to the choice of coding system? Maybe provide `set-process-multibyte' analogue to `set-buffer-multibyte'. > > If you want multibyte strings "without decoding", would emacs-mule > give you that? That's what string-as-multibyte is for I think. -- Kim F. Storm <storm@cua.dk> http://www.cua.dk ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28 2003-01-18 0:48 ` Emacs-diffs Digest, Vol 2, Issue 28 Richard Stallman 2003-01-18 12:35 ` Kim F. Storm @ 2003-01-18 12:40 ` Eli Zaretskii 2003-01-20 0:49 ` Richard Stallman 2003-01-20 2:29 ` unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] Kenichi Handa 2003-01-20 1:52 ` Emacs-diffs Digest, Vol 2, Issue 28 Kenichi Handa 2 siblings, 2 replies; 49+ messages in thread From: Eli Zaretskii @ 2003-01-18 12:40 UTC (permalink / raw) Cc: emacs-devel > From: Richard Stallman <rms@gnu.org> > Date: Fri, 17 Jan 2003 19:48:02 -0500 > > ! The string argument is normally a multibyte string, except: > ! - if the process' input coding system is no-conversion or raw-text, > ! it is a unibyte string (the non-converted input), or else > > Is this really the right way for it to work? > Should the choice of unibyte or multibyte string > be tied in this way to the choice of coding system? I think users expect this behavior when they use no-conversion or raw-text. > If you want multibyte strings "without decoding", would emacs-mule > give you that? I don't think so. emacs-mule is for reading text that is already in the internal Emacs representation, like auto-save files. AFAIK, raw-text does decode the text in the sense that 8-bit characters which have their 8th bit set are decoded into the eight-bit-* charsets. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28 2003-01-18 12:40 ` Eli Zaretskii @ 2003-01-20 0:49 ` Richard Stallman 2003-01-20 2:29 ` unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] Kenichi Handa 1 sibling, 0 replies; 49+ messages in thread From: Richard Stallman @ 2003-01-20 0:49 UTC (permalink / raw) Cc: emacs-devel > Is this really the right way for it to work? > Should the choice of unibyte or multibyte string > be tied in this way to the choice of coding system? Maybe provide `set-process-multibyte' analogue to `set-buffer-multibyte'. I think that is a good idea. ^ permalink raw reply [flat|nested] 49+ messages in thread
* unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-18 12:40 ` Eli Zaretskii 2003-01-20 0:49 ` Richard Stallman @ 2003-01-20 2:29 ` Kenichi Handa 2003-01-20 18:48 ` Eli Zaretskii 1 sibling, 1 reply; 49+ messages in thread From: Kenichi Handa @ 2003-01-20 2:29 UTC (permalink / raw) Cc: emacs-devel In article <3405-Sat18Jan2003154003+0200-eliz@is.elta.co.il>, "Eli Zaretskii" <eliz@is.elta.co.il> writes: >> If you want multibyte strings "without decoding", would emacs-mule >> give you that? > I don't think so. emacs-mule is for reading text that is already in > the internal Emacs representation, like auto-save files. Yes. > AFAIK, raw-text does decode the text in the sense that > 8-bit characters which have their 8th bit set are decoded > into the eight-bit-* charsets. Yes, but that is only in the case that you read a file into a multibyte buffer by raw-text. This conversion from raw byte sequence to multibyte form is what done by string-to-multibyte which I wrote in the previous mail. On process reading, if raw-text is used, the process output is at first read as a unibyte string, the string is coverted to multibyte by string-as-mulitbyte (not by not-yet-existing string-to-multibyte), then inserted in a multibyte buffer. I don't remember why the current code does as above. I think the behaviour what Eli wrote is more consistent with the behaviour of file reading. Shall I change the code as what Eli wrote (by introducing the new function string-to-multibyte)? By the way, it may be clean to have all these functions in parallel, and spare one section describing the difference of MAKE, AS, TO conversions in info. string-make-multibyte string-as-multibyte string-to-multibyte string-make-unibyte string-as-unibyte string-to-unibyte (perpaps the same as string-as-unibyte, or it should signal an error if non-ascii, non-eight-bit-XXX is contained). buffer-make-multibyte buffer-as-multibyte (same as (set-buffer-multibyte BUFFER t)) buffer-to-multibyte buffer-make-unibyte buffer-as-unibyte (same as (set-buffer-multibyte BUFFER nil)) buffer-to-nuibyte --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-20 2:29 ` unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] Kenichi Handa @ 2003-01-20 18:48 ` Eli Zaretskii 2003-01-20 20:55 ` Stefan Monnier 2003-01-21 0:10 ` Kenichi Handa 0 siblings, 2 replies; 49+ messages in thread From: Eli Zaretskii @ 2003-01-20 18:48 UTC (permalink / raw) Cc: emacs-devel > Date: Mon, 20 Jan 2003 11:29:51 +0900 (JST) > From: Kenichi Handa <handa@m17n.org> > > > AFAIK, raw-text does decode the text in the sense that > > 8-bit characters which have their 8th bit set are decoded > > into the eight-bit-* charsets. > > Yes, but that is only in the case that you read a file into > a multibyte buffer by raw-text. This conversion from raw > byte sequence to multibyte form is what done by > string-to-multibyte which I wrote in the previous mail. > > On process reading, if raw-text is used, the process output > is at first read as a unibyte string, the string is coverted > to multibyte by string-as-mulitbyte (not by not-yet-existing > string-to-multibyte), then inserted in a multibyte buffer. Sorry, I don't think I understand the difference. What will we have in the buffer after process output is converted as you describe in the last paragraph above? ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-20 18:48 ` Eli Zaretskii @ 2003-01-20 20:55 ` Stefan Monnier 2003-01-21 0:20 ` Kenichi Handa 2003-01-22 9:59 ` Richard Stallman 2003-01-21 0:10 ` Kenichi Handa 1 sibling, 2 replies; 49+ messages in thread From: Stefan Monnier @ 2003-01-20 20:55 UTC (permalink / raw) Cc: handa > > Date: Mon, 20 Jan 2003 11:29:51 +0900 (JST) > > From: Kenichi Handa <handa@m17n.org> > > > > > AFAIK, raw-text does decode the text in the sense that > > > 8-bit characters which have their 8th bit set are decoded > > > into the eight-bit-* charsets. > > > > Yes, but that is only in the case that you read a file into > > a multibyte buffer by raw-text. This conversion from raw > > byte sequence to multibyte form is what done by > > string-to-multibyte which I wrote in the previous mail. > > > > On process reading, if raw-text is used, the process output > > is at first read as a unibyte string, the string is coverted > > to multibyte by string-as-mulitbyte (not by not-yet-existing > > string-to-multibyte), then inserted in a multibyte buffer. > > Sorry, I don't think I understand the difference. What will we have > in the buffer after process output is converted as you describe in the > last paragraph above? While we're at it, how about making string-as-multibyte obsolete ? It's not used much and it has been abused many times in the past. Also, I believe it's more or less equivalent to (decode-coding-string str 'emacs-mule) I'd be interested to know what are the differences, if any. Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-20 20:55 ` Stefan Monnier @ 2003-01-21 0:20 ` Kenichi Handa 2003-01-21 0:54 ` Stefan Monnier 2003-01-21 5:57 ` Eli Zaretskii 2003-01-22 9:59 ` Richard Stallman 1 sibling, 2 replies; 49+ messages in thread From: Kenichi Handa @ 2003-01-21 0:20 UTC (permalink / raw) Cc: emacs-devel In article <200301202055.h0KKtun11691@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: > While we're at it, how about making string-as-multibyte obsolete ? > It's not used much and it has been abused many times in the past. It is useful for instance in this scenario. I don't remember a concrete example, but some package is doing this kind of thing. Read a file of weird encoding in a unibyte buffer. Parse the contents and do decode-coding-region one bunch by one with different coding systems. Extract some part from that unibyte buffer and insert it in a mulitbyte buffer. The last step is: (let ((str (buffer-substring FROM TO))) (save-excursion (set-buffer MULTIBYTE-BUF) (insert (string-as-multibyte str)))) > Also, I believe it's more or less equivalent to > (decode-coding-string str 'emacs-mule) Yes, for the moment. But, in emacs-unicode, we must change it to: (decode-coding-string str 'utf-8-emacs) On the other hand, we don't have to change a code using string-as-multibyte. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-21 0:20 ` Kenichi Handa @ 2003-01-21 0:54 ` Stefan Monnier 2003-01-21 5:57 ` Eli Zaretskii 1 sibling, 0 replies; 49+ messages in thread From: Stefan Monnier @ 2003-01-21 0:54 UTC (permalink / raw) Cc: monnier+gnu/emacs > > While we're at it, how about making string-as-multibyte obsolete ? > > It's not used much and it has been abused many times in the past. > > It is useful for instance in this scenario. I don't > remember a concrete example, but some package is doing this > kind of thing. > > Read a file of weird encoding in a unibyte buffer. Parse > the contents and do decode-coding-region one bunch by one > with different coding systems. Extract some part from that > unibyte buffer and insert it in a mulitbyte buffer. The > last step is: > (let ((str (buffer-substring FROM TO))) > (save-excursion > (set-buffer MULTIBYTE-BUF) > (insert (string-as-multibyte str)))) Without a concrete example I don't find this scenario compelling. E.g. I fail to see why it should do decode-coding-region + buffer-substring + string-as-multibyte rather than buffer-substring + decode-coding-string But even if the scenario is possible, using (decode-coding-region str 'emacs-mule) doesn't seem much worse than (string-as-multibyte str), so I'm not sure how relevant it is to whether or not we should obsolete string-as-multibyte. > > Also, I believe it's more or less equivalent to > > (decode-coding-string str 'emacs-mule) > > Yes, for the moment. But, in emacs-unicode, we must change > it to: > (decode-coding-string str 'utf-8-emacs) > On the other hand, we don't have to change a code using > string-as-multibyte. So I hereby suggest (define-coding-system-alias 'internal 'emacs-mule) which we can happily change in Emacs-22 to `utf-8-emacs', such that we will still have (string-as-multibyte str) == (decode-coding-string str 'internal) -- Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-21 0:20 ` Kenichi Handa 2003-01-21 0:54 ` Stefan Monnier @ 2003-01-21 5:57 ` Eli Zaretskii 1 sibling, 0 replies; 49+ messages in thread From: Eli Zaretskii @ 2003-01-21 5:57 UTC (permalink / raw) Cc: emacs-devel On Tue, 21 Jan 2003, Kenichi Handa wrote: > In article <200301202055.h0KKtun11691@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: > > While we're at it, how about making string-as-multibyte obsolete ? > > It's not used much and it has been abused many times in the past. > > It is useful for instance in this scenario. I don't > remember a concrete example, but some package is doing this > kind of thing. I think it might be Gnus. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-20 20:55 ` Stefan Monnier 2003-01-21 0:20 ` Kenichi Handa @ 2003-01-22 9:59 ` Richard Stallman 2003-01-22 14:12 ` Stefan Monnier 1 sibling, 1 reply; 49+ messages in thread From: Richard Stallman @ 2003-01-22 9:59 UTC (permalink / raw) Cc: handa While we're at it, how about making string-as-multibyte obsolete ? It is not obsolete--there are reasons to use it. I think avoiding string-FOO-multibyte and using decode-coding-string instead would make things a lot more clear. I don't see any advantage in the change. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-22 9:59 ` Richard Stallman @ 2003-01-22 14:12 ` Stefan Monnier 2003-01-22 18:09 ` Eli Zaretskii 2003-01-24 5:43 ` Richard Stallman 0 siblings, 2 replies; 49+ messages in thread From: Stefan Monnier @ 2003-01-22 14:12 UTC (permalink / raw) Cc: Stefan Monnier > While we're at it, how about making string-as-multibyte obsolete ? > > It is not obsolete--there are reasons to use it. But it can be replaced by a call to decode-coding-string, so it is not indispensable. > I think avoiding string-FOO-multibyte and using decode-coding-string > instead would make things a lot more clear. > > I don't see any advantage in the change. Here is the reason why we should discourage the use of unibyte<->multibyte conversions and recommend coding/decoding instead: There is a lot of confusion among Emacs hackers about "what's this MULE stuff" and "why Emacs does conversions instead of keeping things as they are", typically for users of latin-1 locales (but more generally any 8-bit locale) where they don't understand the difference between bytes and chars. This is of course why we introduced unibyte buffers in the first place: a lot of code was not properly updated to MULE and was not doing conversions where they're necessary. So where does the unibyte<->multibyte stuff comes in ? I think it simply promotes the illusion that it is possible to "switch between the two equivalent representation" although there's clearly no unambiguous equivalence. So people end up with "oh, I have a unibyte thing here and Emacs wants a multibyte thing instead, so I'll just make it multibyte" using some kind of default encoding which "should work most of the time". If coders such as Eli and myself don't fully understand the semantics of string-as-multibyte and string-make-multibyte (and the various ways in which they are implicitly called), it's clear that those functions should basically not be used by anyone. Using decode-coding-string is just as easy and makes things much more clear so we should encourage it. Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-22 14:12 ` Stefan Monnier @ 2003-01-22 18:09 ` Eli Zaretskii 2003-01-23 11:38 ` Richard Stallman 2003-01-24 5:43 ` Richard Stallman 1 sibling, 1 reply; 49+ messages in thread From: Eli Zaretskii @ 2003-01-22 18:09 UTC (permalink / raw) Cc: emacs-devel > From: "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> > Date: Wed, 22 Jan 2003 09:12:49 -0500 > > If coders such as Eli and myself don't fully understand the semantics > of string-as-multibyte and string-make-multibyte (and the various ways > in which they are implicitly called), it's clear that those functions > should basically not be used by anyone. That is my opinion as well, FWIW. The string conversion functions _are_ useful, but they employ lots of ad-hoc decisions to cope with different, sometimes conflicting, goals. I've heard Handa-san many times explaining what each one of the function does, but I find myself forgetting that too soon, and need to consult the code and conduct experiments every time I bump into them. By constrast, encode-coding-* and decode-coding-* are very straightforward. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-22 18:09 ` Eli Zaretskii @ 2003-01-23 11:38 ` Richard Stallman 2003-01-23 16:18 ` Stefan Monnier 2003-01-23 17:48 ` Eli Zaretskii 0 siblings, 2 replies; 49+ messages in thread From: Richard Stallman @ 2003-01-23 11:38 UTC (permalink / raw) Cc: monnier+gnu/emacs By constrast, encode-coding-* and decode-coding-* are very straightforward. These functions, per se, are straightforward--but would the difference between the various proposed coding systems be equally straightforward? ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-23 11:38 ` Richard Stallman @ 2003-01-23 16:18 ` Stefan Monnier 2003-01-24 17:16 ` Richard Stallman 2003-01-23 17:48 ` Eli Zaretskii 1 sibling, 1 reply; 49+ messages in thread From: Stefan Monnier @ 2003-01-23 16:18 UTC (permalink / raw) Cc: monnier+gnu/emacs > By constrast, encode-coding-* and decode-coding-* are very > straightforward. > These functions, per se, are straightforward--but would the difference > between the various proposed coding systems be equally > straightforward? I think people understand the difference between binary, emacs-mule, and locale-coding-system. Probably not everybody does, but those who don't, *really* can't understand string-FOO-multibyte either and should thus stay away from it. Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-23 16:18 ` Stefan Monnier @ 2003-01-24 17:16 ` Richard Stallman 0 siblings, 0 replies; 49+ messages in thread From: Richard Stallman @ 2003-01-24 17:16 UTC (permalink / raw) Cc: monnier+gnu/emacs I think people understand the difference between binary, emacs-mule, and locale-coding-system. Probably not everybody does, but those who don't, *really* can't understand string-FOO-multibyte either and should thus stay away from it. I am not convinced this is true. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-23 11:38 ` Richard Stallman 2003-01-23 16:18 ` Stefan Monnier @ 2003-01-23 17:48 ` Eli Zaretskii 1 sibling, 0 replies; 49+ messages in thread From: Eli Zaretskii @ 2003-01-23 17:48 UTC (permalink / raw) Cc: emacs-devel > From: Richard Stallman <rms@gnu.org> > Date: Thu, 23 Jan 2003 06:38:17 -0500 > > These functions, per se, are straightforward--but would the difference > between the various proposed coding systems be equally > straightforward? I'm not sure what are you asking. If you are concerned that people would not grasp the effects of encoding/decoding text with ``obscure'' coding systems like no-conversion and raw-text, then I agree with Stefan: these conversions are well-defined and can be understood upon careful reading. (If the current docs doesn't do a good job explaining those coding systems, we could improve that.) The important point, to me, is that using en/decode-coding-*, you know _exactly_ what will happen, since you specify the encoding. The string-*-uni/multibyte functions, by contrast, make complicated decisions about the encoding, so you need to memorize those decisions to use the functions in a predictable manner. I find myself unable to remember that; perhaps it's just me and my failing memory. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-22 14:12 ` Stefan Monnier 2003-01-22 18:09 ` Eli Zaretskii @ 2003-01-24 5:43 ` Richard Stallman 2003-01-26 1:30 ` Stefan Monnier 1 sibling, 1 reply; 49+ messages in thread From: Richard Stallman @ 2003-01-24 5:43 UTC (permalink / raw) Cc: monnier+gnu/emacs Using decode-coding-string is just as easy and makes things much more clear so we should encourage it. I don't see how it makes anything clearer. It would tend to make the documentation less clear. Right now there are two (perhaps in the future three) functions, each of which has a doc string saying what it does and what it's good for. Where would that info go if we make the change you recommend? ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-24 5:43 ` Richard Stallman @ 2003-01-26 1:30 ` Stefan Monnier 2003-01-27 2:31 ` Richard Stallman 2003-01-27 7:38 ` Kenichi Handa 0 siblings, 2 replies; 49+ messages in thread From: Stefan Monnier @ 2003-01-26 1:30 UTC (permalink / raw) Cc: Stefan Monnier > Using decode-coding-string is just as easy and makes things much > more clear so we should encourage it. > > I don't see how it makes anything clearer. It would tend to make the > documentation less clear. Right now there are two (perhaps in the > future three) functions, each of which has a doc string saying what it > does and what it's good for. Where would that info go if we make the > change you recommend? The change I suggest is to obsolete those functions and to recommend decode-coding-string instead, which has a perfectly good docstring itself and so do each and every coding-system that you might want to pass to that function. I don't understand your question. When people use string-FOO-multibyte it's generally because they don't understand what's going on and they think "a char is a char is a char and I don't get this multibyte madness": using decode-coding-string would force them to better understand what's going on. Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-26 1:30 ` Stefan Monnier @ 2003-01-27 2:31 ` Richard Stallman 2003-01-27 7:38 ` Kenichi Handa 1 sibling, 0 replies; 49+ messages in thread From: Richard Stallman @ 2003-01-27 2:31 UTC (permalink / raw) Cc: monnier+gnu/emacs The change I suggest is to obsolete those functions and to recommend decode-coding-string instead, which has a perfectly good docstring itself and so do each and every coding-system that you might want to pass to that function. I've decided not to make this change. I tried to explain why; if it isn't clear, I don't know how to explain better. I am sorry I cannot communicate it better. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-26 1:30 ` Stefan Monnier 2003-01-27 2:31 ` Richard Stallman @ 2003-01-27 7:38 ` Kenichi Handa 2003-01-27 14:12 ` Stefan Monnier 1 sibling, 1 reply; 49+ messages in thread From: Kenichi Handa @ 2003-01-27 7:38 UTC (permalink / raw) Cc: monnier+gnu/emacs In article <200301260130.h0Q1Uo518101@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: > I don't understand your question. When people use string-FOO-multibyte > it's generally because they don't understand what's going on and they > think "a char is a char is a char and I don't get this multibyte madness": > using decode-coding-string would force them to better understand what's > going on. But I suspect that such people won't use the correct coding system anyway. To use the correct coding system, they must clearly understand what kind of multibyte string they want. And if they understand that, there should be no difficulty in using the correct string-FOO-multibyte function. In one sense, it seems clean to use the concept of decoding and encoding for all unibyte<->multibyte conversions coherently. But, that hides what Emacs actually does. You wrote: > I find it more helpful to think in terms of bytes and chars: Definitely. But, > unibyte strings are sequences of bytes while multibyte > strings are sequences of chars. Unfortunately no. Emacs can represent a character sequence both in unibyte and multibyte string. Emacs can also represent a raw-byte sequence both in unibyte and multibyte string. For a multibyte string, which it represents (char-seq or byte-seq) can be detected by what kind of characters it contains. But, for a unibyte string, it's impossible, only the context of how it is used decides that. For string-make-multibyte, the input is a char-seq, and the resulf of conversion is also a char-seq. So, the concept of decoding is not applicable here. For string-to-multibyte, the input is a byte-seq, and the result of conversion is also a byte-seq. So, again, the concept of decoding is not applicable neither. For string-as-multibyte, the intput is a byte-seq, and the result of conversion is a char-seq. So, only here, the concept of decoding is also applicable. I hope this explains why I insist on string-FOO-multibyte functions. By the way, it may be good to instroduce coding system aliases `internal' and `default', and write, for instance, in the docstring of string-as-multibyte that the effect is the same as (decode-coding-string UNIBYTE-STRING 'internal). --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-27 7:38 ` Kenichi Handa @ 2003-01-27 14:12 ` Stefan Monnier 2003-01-29 11:23 ` Kenichi Handa 0 siblings, 1 reply; 49+ messages in thread From: Stefan Monnier @ 2003-01-27 14:12 UTC (permalink / raw) Cc: monnier+gnu/emacs > In one sense, it seems clean to use the concept of decoding > and encoding for all unibyte<->multibyte conversions > coherently. But, that hides what Emacs actually does. You mean that string-FOO-multibyte uses special-cased code and that there is thus a difference of efficiency ? > > unibyte strings are sequences of bytes while multibyte > > strings are sequences of chars. > Unfortunately no. I don't think there is any "truth" here. There are simply different ways to look at the same thing. Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-27 14:12 ` Stefan Monnier @ 2003-01-29 11:23 ` Kenichi Handa 0 siblings, 0 replies; 49+ messages in thread From: Kenichi Handa @ 2003-01-29 11:23 UTC (permalink / raw) Cc: monnier+gnu/emacs In article <200301271412.h0REClJ30624@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: >> In one sense, it seems clean to use the concept of decoding >> and encoding for all unibyte<->multibyte conversions >> coherently. But, that hides what Emacs actually does. > You mean that string-FOO-multibyte uses special-cased code > and that there is thus a difference of efficiency ? Yes. string-FOO-multibyte are more effcient than decode-coding-string. But, that is not the point. >> > unibyte strings are sequences of bytes while multibyte >> > strings are sequences of chars. >> Unfortunately no. > I don't think there is any "truth" here. There are simply different > ways to look at the same thing. I don't understand why you don't think my explanation is not true. You wrote: >> Converting between bytes and chars is the purpose of >> coding-systems. Ok, then resulting region of encode-coding-region is a sequence of bytes, not chars, even if it's a multibyte buffer. Thus, the return string of buffer-substring on that region (let's name it MULTI) is also a byte sequence. Using (string-to-unibyte MULTI) to get the same byte sequence but in unibyte form is ok as long as we adopt my interpretatoin of that function. But, doing (encode-coding-string MULTI 'raw-text) is conceptually broken because MULTI is already a byte sequence. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-20 18:48 ` Eli Zaretskii 2003-01-20 20:55 ` Stefan Monnier @ 2003-01-21 0:10 ` Kenichi Handa 2003-01-21 0:45 ` Stefan Monnier ` (2 more replies) 1 sibling, 3 replies; 49+ messages in thread From: Kenichi Handa @ 2003-01-21 0:10 UTC (permalink / raw) Cc: emacs-devel In article <6480-Mon20Jan2003214849+0200-eliz@is.elta.co.il>, "Eli Zaretskii" <eliz@is.elta.co.il> writes: >> On process reading, if raw-text is used, the process output >> is at first read as a unibyte string, the string is coverted >> to multibyte by string-as-mulitbyte (not by not-yet-existing >> string-to-multibyte), then inserted in a multibyte buffer. > Sorry, I don't think I understand the difference. What will we have > in the buffer after process output is converted as you describe in the > last paragraph above? Ok, here's an example (Latin-1 lang. env.). unibyte sequence (hex): 81 81 C0 C0 result of conversion display in multbyte buffer string-as-multibyte: 9E A1 81 C0 C0 \201À\300 string-make-multibyte: 9E A1 9E A1 81 C0 81 C0 \201\201ÀÀ string-to-multibyte: 9E A1 9E A1 C0 C0 \201\201\300\300 (1) Reading a process output by raw-text into a multibyte buffer does AS conversion. I think this should do TO conversion to be consistent with (3). (2) Reading a file by raw-text (resulting in a unibyte buffer) and copying the contents into a multibyte buffer does MAKE conversion. This is Emacs' default unibyte->multibyte conversion. (3) Inserting a file by raw-text in a multibyte buffer does TO conversion. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-21 0:10 ` Kenichi Handa @ 2003-01-21 0:45 ` Stefan Monnier 2003-01-21 6:01 ` Eli Zaretskii 2003-01-21 8:04 ` Kenichi Handa 2003-01-21 5:56 ` Eli Zaretskii 2003-01-22 10:00 ` Richard Stallman 2 siblings, 2 replies; 49+ messages in thread From: Stefan Monnier @ 2003-01-21 0:45 UTC (permalink / raw) Cc: emacs-devel > In article <6480-Mon20Jan2003214849+0200-eliz@is.elta.co.il>, "Eli Zaretskii" <eliz@is.elta.co.il> writes: > >> On process reading, if raw-text is used, the process output > >> is at first read as a unibyte string, the string is coverted > >> to multibyte by string-as-mulitbyte (not by not-yet-existing > >> string-to-multibyte), then inserted in a multibyte buffer. > > > Sorry, I don't think I understand the difference. What will we have > > in the buffer after process output is converted as you describe in the > > last paragraph above? > > Ok, here's an example (Latin-1 lang. env.). > > unibyte sequence (hex): 81 81 C0 C0 > result of conversion display in multbyte buffer > string-as-multibyte: 9E A1 81 C0 C0 \201À\300 > string-make-multibyte: 9E A1 9E A1 81 C0 81 C0 \201\201ÀÀ > string-to-multibyte: 9E A1 9E A1 C0 C0 \201\201\300\300 I find the terminology and the concepts confusing. On the other hand, I understand the concept of encoding and decoding. The following equivalences almost hold: (string-as-multibyte str) == (decode-coding-string str 'internal) (string-make-multibyte str) == (decode-coding-string str 'default) (string-to-multibyte str) == (decode-coding-string str 'raw-text) I said "almost" because: 1 - there is no `internal' coding-system as of now. In Emacs-21 we'd use `emacs-mule' but for Emacs-22 it would be `utf-8-emacs'. I'm still not sure what such a thing is useful for, tho (see my other email). 2 - there is no `default' coding-system either. Or maybe locale-coding-system is this default: if your locale is latin-1 then that's latin-1. For non-8-bit locales, I don't know what string-make-multibyte does. 3 - when called with a `raw-text' coding-system, decode-coding-string returns a unibyte string, which is obviously not what we want here. It might make sense for internal operations to return unibyte strings for the `raw-text' case, but I was really surprised that decode-coding-string would ever return a unibyte string. I think avoiding string-FOO-multibyte and using decode-coding-string instead would make things a lot more clear. -- Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-21 0:45 ` Stefan Monnier @ 2003-01-21 6:01 ` Eli Zaretskii 2003-01-21 6:43 ` Kenichi Handa 2003-01-21 8:04 ` Kenichi Handa 1 sibling, 1 reply; 49+ messages in thread From: Eli Zaretskii @ 2003-01-21 6:01 UTC (permalink / raw) Cc: emacs-devel On Mon, 20 Jan 2003, Stefan Monnier wrote: > > unibyte sequence (hex): 81 81 C0 C0 > > result of conversion display in multbyte buffer > > string-as-multibyte: 9E A1 81 C0 C0 \201À\300 > > string-make-multibyte: 9E A1 9E A1 81 C0 81 C0 \201\201ÀÀ > > string-to-multibyte: 9E A1 9E A1 C0 C0 \201\201\300\300 > [...] > 3 - when called with a `raw-text' coding-system, decode-coding-string > returns a unibyte string I might be missing something, but I think you are wrong: the sequence "9E A1 9E A1 C0 C0" is _not_ a unibyte string. For example, "9E A1" is the multibyte encoding of the 81 byte. > I think avoiding string-FOO-multibyte and using decode-coding-string > instead would make things a lot more clear. FWIW, I never use string-*-multibyte because I could never remember what exactly does each variant do. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-21 6:01 ` Eli Zaretskii @ 2003-01-21 6:43 ` Kenichi Handa 0 siblings, 0 replies; 49+ messages in thread From: Kenichi Handa @ 2003-01-21 6:43 UTC (permalink / raw) Cc: monnier+gnu/emacs In article <Pine.SUN.3.91.1030121075724.9650D-100000@is>, Eli Zaretskii <eliz@is.elta.co.il> writes: > On Mon, 20 Jan 2003, Stefan Monnier wrote: >> > unibyte sequence (hex): 81 81 C0 C0 >> > result of conversion display in multbyte buffer >> > string-as-multibyte: 9E A1 81 C0 C0 \201À\300 >> > string-make-multibyte: 9E A1 9E A1 81 C0 81 C0 \201\201ÀÀ >> > string-to-multibyte: 9E A1 9E A1 C0 C0 \201\201\300\300 >> [...] >> 3 - when called with a `raw-text' coding-system, decode-coding-string >> returns a unibyte string > I might be missing something, but I think you are wrong: the sequence > "9E A1 9E A1 C0 C0" is _not_ a unibyte string. I didn't wrote that is a unibyte string. In the above example, only the first line is the unibyte string. The remaining lines shows the result of unibyte->multibyte conversion, thus they are multibyte strings. > For example, "9E A1" is the multibyte encoding of the 81 > byte. Yes. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-21 0:45 ` Stefan Monnier 2003-01-21 6:01 ` Eli Zaretskii @ 2003-01-21 8:04 ` Kenichi Handa 2003-01-21 15:02 ` Miles Bader ` (2 more replies) 1 sibling, 3 replies; 49+ messages in thread From: Kenichi Handa @ 2003-01-21 8:04 UTC (permalink / raw) Cc: emacs-devel In article <200301210045.h0L0jS812745@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: >> unibyte sequence (hex): 81 81 C0 C0 >> result of conversion display in multbyte buffer >> string-as-multibyte: 9E A1 81 C0 C0 \201À\300 >> string-make-multibyte: 9E A1 9E A1 81 C0 81 C0 \201\201ÀÀ >> string-to-multibyte: 9E A1 9E A1 C0 C0 \201\201\300\300 > I find the terminology and the concepts confusing. I agree that those names are not that intuitive, but the first two were there before I noticed it. :-p But, in what sense, the concepts are confusing? > On the other hand, I understand the concept of encoding and decoding. > The following equivalences almost hold: > (string-as-multibyte str) == (decode-coding-string str 'internal) > (string-make-multibyte str) == (decode-coding-string str 'default) > (string-to-multibyte str) == (decode-coding-string str 'raw-text) > I said "almost" because: Please note that decode-coding-string also does eol conversion. Using 'internal-unix, 'default-unix, 'raw-text-unix will make them more equivalent. > 1 - there is no `internal' coding-system as of now. In Emacs-21 we'd > use `emacs-mule' but for Emacs-22 it would be `utf-8-emacs'. > I'm still not sure what such a thing is useful for, tho (see > my other email). Before we introduced eight-bit-XXXX, (insert (string-as-multibyte UNIBYTE-STRING)) was the only way to preserve the original byte sequence in a multibyte buffer. But, as we now have eight-bit-XXXX, I agree that string-as-multibyte is not that useful, string-to-multibyte is better. > 2 - there is no `default' coding-system either. Or maybe > locale-coding-system is this default: if your locale is > latin-1 then that's latin-1. If one does not do set-language-enviroment, locale-coding-system can be used as `default'. > For non-8-bit locales, I don't know what > string-make-multibyte does. In that case, it does latin-1 decoding, ... yes, not that good. > 3 - when called with a `raw-text' coding-system, decode-coding-string > returns a unibyte string, which is obviously not what we want here. > It might make sense for internal operations to return unibyte > strings for the `raw-text' case, but I was really surprised that > decode-coding-string would ever return a unibyte string. I tend to agree that it is better that decode-coding-string always return a multibyte string now. > I think avoiding string-FOO-multibyte and using decode-coding-string > instead would make things a lot more clear. I think string-FOO-multibyte (and also string-FOO-unibyte) are conceptually different from decoding (and encoding) operations. It's difficult for me to explain it clearly, but I'll try. Decoding and encoding are interface between Emacs and the outer world. Decoding is for converting an external byte sequence (i.e. belonging to a world out of Emacs) into Emacs' represenatation. Encoding is for converting Emacs' represenatation to a byte sequence that is used out of Emacs. But string-FOO-multi/unibyte are convesion within Emacs' world. And, if one wants to insert a result of encode-coding-string in a multibyte buffer (perhaps for some post-processing), what he should do? If we have string-to-multibyte, we can do this: (insert (string-to-multibyte (encode-coding-string MULTIBYTE-STRING CODING))) If we don't have it, and provided that decode-coding-string always returns a multibyte string, we must do: (insert (decode-coding-string (encode-coding-string MULTIBYTE-STRING CODING) 'raw-text-unix)) Isn't it very funny? By the way, I think the culprit of the current problem is this Emacs' doctrine: Do unibyte<->mutibyte conversion by "MAKE" by default. Although this doctrine surely works for handling unibyte and multibyte represenation transparently, it makes Elisp programmers very very confused. And it is useful only for people whose main charset is single-byte. I seriously considering changing it in emacs-unicode. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-21 8:04 ` Kenichi Handa @ 2003-01-21 15:02 ` Miles Bader 2003-01-21 17:44 ` Stefan Monnier 2003-01-22 10:00 ` Richard Stallman 2 siblings, 0 replies; 49+ messages in thread From: Miles Bader @ 2003-01-21 15:02 UTC (permalink / raw) Cc: monnier+gnu/emacs On Tue, Jan 21, 2003 at 05:04:37PM +0900, Kenichi Handa wrote: > And, if one wants to insert a result of encode-coding-string > in a multibyte buffer (perhaps for some post-processing), > what he should do? If we have string-to-multibyte, we can > do this: > (insert (string-to-multibyte > (encode-coding-string MULTIBYTE-STRING CODING))) > If we don't have it, and provided that decode-coding-string > always returns a multibyte string, we must do: > (insert (decode-coding-string > (encode-coding-string MULTIBYTE-STRING CODING) 'raw-text-unix)) > Isn't it very funny? Actually I find the second variant much _more_ clear, as it makes it very obvious what's happening. -Miles -- `The suburb is an obsolete and contradictory form of human settlement' ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-21 8:04 ` Kenichi Handa 2003-01-21 15:02 ` Miles Bader @ 2003-01-21 17:44 ` Stefan Monnier 2003-01-22 10:00 ` Richard Stallman 2 siblings, 0 replies; 49+ messages in thread From: Stefan Monnier @ 2003-01-21 17:44 UTC (permalink / raw) Cc: emacs-devel > In article <200301210045.h0L0jS812745@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: > >> unibyte sequence (hex): 81 81 C0 C0 > >> result of conversion display in multbyte buffer > >> string-as-multibyte: 9E A1 81 C0 C0 \201À\300 > >> string-make-multibyte: 9E A1 9E A1 81 C0 81 C0 \201\201ÀÀ > >> string-to-multibyte: 9E A1 9E A1 C0 C0 \201\201\300\300 > > > I find the terminology and the concepts confusing. > > I agree that those names are not that intuitive, but the > first two were there before I noticed it. :-p > But, in what sense, the concepts are confusing? The concept of string-as-multibyte made some sense in Emacs-20 when it was really "look under the hood: take the same bytes but interpret them differently". In Emacs-21, this is not the case any more so I don't really understand what's the intent behind it other than emacs-mule decoding (that it might happen to come out of some other decoding step rather than out of a file is not really relevant, I think). I think what I find confusing is that the name of those functions implicitly says "take the string and give me the same one, but just multibyte instead of unibyte", even though there's no unambiguous way to have "the same one". So there has to be a choice of how the conversion between unibyte and multibyte takes place, but this choice is not clearly described by the functions's name. > Please note that decode-coding-string also does eol > conversion. Using 'internal-unix, 'default-unix, Sorry for my sloppyness. > 'raw-text-unix will make them more equivalent. This should probably be `no-conversion' (or `binary'). Admittedly, it's the same, but I think it carries the intent a bit better. > But, as we now have eight-bit-XXXX, I agree that > string-as-multibyte is not that useful, string-to-multibyte > is better. But they do different things and the name-difference does not explain clearly the subtle distinction, so I think it's more confusing than anything else. > > 2 - there is no `default' coding-system either. Or maybe > > locale-coding-system is this default: if your locale is > > latin-1 then that's latin-1. > > If one does not do set-language-enviroment, > locale-coding-system can be used as `default'. And otherwise ? The mere fact that I don't know the answer to this question seems like a good indication that pretty much nobody knows what `string-make-multibyte' does, so anyone who uses it is most likely using it wrong. Luckily, it seems only ps-mule.el uses it (although much more code uses the underlying nonascii-translation-table functionality). > > 3 - when called with a `raw-text' coding-system, decode-coding-string > > returns a unibyte string, which is obviously not what we want here. > > It might make sense for internal operations to return unibyte > > strings for the `raw-text' case, but I was really surprised that > > decode-coding-string would ever return a unibyte string. > > I tend to agree that it is better that decode-coding-string > always return a multibyte string now. If it can be fixed, we can recommend (decode-coding-string str 'no-conversion) rather than introducing a new function string-to-multibyte. > I think string-FOO-multibyte (and also string-FOO-unibyte) > are conceptually different from decoding (and encoding) > operations. It's difficult for me to explain it clearly, > but I'll try. > > Decoding and encoding are interface between Emacs and the > outer world. > > Decoding is for converting an external byte sequence > (i.e. belonging to a world out of Emacs) into Emacs' > representation. > > Encoding is for converting Emacs' represenatation to a byte > sequence that is used out of Emacs. But the `emacs-mule' coding-system is used both inside and outside, and same goes for `binary', so the distinction between inside and outside is not very clear-cut. I find it more helpful to think in terms of bytes and chars: unibyte strings are sequences of bytes while multibyte strings are sequences of chars. Converting between bytes and chars is the purpose of coding-systems. In such a context, string-FOO-multibyte are obviously just various forms of decoding, but the names don't give a good sense of which decoding is used. > And, if one wants to insert a result of encode-coding-string > in a multibyte buffer (perhaps for some post-processing), > what he should do? If we have string-to-multibyte, we can > do this: > (insert (string-to-multibyte > (encode-coding-string MULTIBYTE-STRING CODING))) > If we don't have it, and provided that decode-coding-string > always returns a multibyte string, we must do: > (insert (decode-coding-string > (encode-coding-string MULTIBYTE-STRING CODING) 'raw-text-unix)) > Isn't it very funny? Obviously, I agree with Miles, that the second is much more clear (especially if you replace `raw-text-unix' with `no-conversion'. well, I prefer `binary' myself, since the `no-conversion' is also a misnomer given that a conversion does take place). > By the way, I think the culprit of the current problem is > this Emacs' doctrine: > Do unibyte<->mutibyte conversion by "MAKE" by default. Since MAKE uses some kind of "default" related to the current language environment, I think it's OK, except that it's not clear in what way it's "related". But of course, there should simply never be such a thing as "guess what this unibyte stream translates into". The coding-system used to decode unibyte into multibyte should always be "clearly" defined (by the process's coding-system, the keyboard's coding-system, ...). I.e. it is simply a bug to insert a unibyte string into a multibyte buffer (and vice versa). As for inserting a char between 128 and 256 into a multibyte buffer... it should ideally always be treated as an eight-bit-foo char, but I think that making such a change right now would not be wise because there is still too much code which forgets to decode its bytes into chars (an instead relies on the MAKE default to turn those chars into latin-1 chars). > Although this doctrine surely works for handling unibyte and > multibyte represenation transparently, it makes Elisp > programmers very very confused. And it is useful only for > people whose main charset is single-byte. > > I seriously considering changing it in emacs-unicode. Might be a good idea for emacs-unicode indeed. Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-21 8:04 ` Kenichi Handa 2003-01-21 15:02 ` Miles Bader 2003-01-21 17:44 ` Stefan Monnier @ 2003-01-22 10:00 ` Richard Stallman 2 siblings, 0 replies; 49+ messages in thread From: Richard Stallman @ 2003-01-22 10:00 UTC (permalink / raw) Cc: monnier+gnu/emacs By the way, I think the culprit of the current problem is this Emacs' doctrine: Do unibyte<->mutibyte conversion by "MAKE" by default. Although this doctrine surely works for handling unibyte and multibyte represenation transparently, it makes Elisp programmers very very confused. It is absolutely crucial to make unibyte and multibyte operation interoperate smoothly for Latin character sets. We did this so that people who used Emacs for European character sets would not have trouble. It is more important to make things easy for non-programmers than to make it easy for programmers. So don't change this. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-21 0:10 ` Kenichi Handa 2003-01-21 0:45 ` Stefan Monnier @ 2003-01-21 5:56 ` Eli Zaretskii 2003-01-21 6:38 ` Kenichi Handa 2003-01-22 10:00 ` Richard Stallman 2 siblings, 1 reply; 49+ messages in thread From: Eli Zaretskii @ 2003-01-21 5:56 UTC (permalink / raw) Cc: emacs-devel On Tue, 21 Jan 2003, Kenichi Handa wrote: > > Sorry, I don't think I understand the difference. What will we have > > in the buffer after process output is converted as you describe in the > > last paragraph above? > > Ok, here's an example (Latin-1 lang. env.). Thanks for taking time to explain these subtleties. > (1) Reading a process output by raw-text into a multibyte > buffer does AS conversion. I think this should do TO > conversion to be consistent with (3). I tend to agree, but without knowing the exact reason for the current behavior, I fear that we will break something. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-21 5:56 ` Eli Zaretskii @ 2003-01-21 6:38 ` Kenichi Handa 0 siblings, 0 replies; 49+ messages in thread From: Kenichi Handa @ 2003-01-21 6:38 UTC (permalink / raw) Cc: emacs-devel In article <Pine.SUN.3.91.1030121075455.9650B-100000@is>, Eli Zaretskii <eliz@is.elta.co.il> writes: > On Tue, 21 Jan 2003, Kenichi Handa wrote: >> > Sorry, I don't think I understand the difference. What will we have >> > in the buffer after process output is converted as you describe in the >> > last paragraph above? >> >> Ok, here's an example (Latin-1 lang. env.). > Thanks for taking time to explain these subtleties. >> (1) Reading a process output by raw-text into a multibyte >> buffer does AS conversion. I think this should do TO >> conversion to be consistent with (3). > I tend to agree, but without knowing the exact reason for the current > behavior, I fear that we will break something. I don't remeber well now. :-( Perhpas, because there was a choise only between string-as-multibyte and string-make-multibyte, and I thought string-as-multibyte is less surprising than string-make-multibyte. That is because the reason for specifying raw-text or no-conversion for process should be that one doesn't want code conversion. If we use string-make-multibyte, the result is almost the same as decoding by, for instance, iso-latin-1. If we had string-to-multibyte at that time, I might have used it. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-21 0:10 ` Kenichi Handa 2003-01-21 0:45 ` Stefan Monnier 2003-01-21 5:56 ` Eli Zaretskii @ 2003-01-22 10:00 ` Richard Stallman 2003-01-22 14:12 ` Stefan Monnier 2 siblings, 1 reply; 49+ messages in thread From: Richard Stallman @ 2003-01-22 10:00 UTC (permalink / raw) Cc: emacs-devel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 623 bytes --] unibyte sequence (hex): 81 81 C0 C0 result of conversion display in multbyte buffer string-as-multibyte: 9E A1 81 C0 C0 \201À\300 string-make-multibyte: 9E A1 9E A1 81 C0 81 C0 \201\201ÀÀ string-to-multibyte: 9E A1 9E A1 C0 C0 \201\201\300\300 I think this example should go in the Lisp manual. Could you add it? (1) Reading a process output by raw-text into a multibyte buffer does AS conversion. I think this should do TO conversion to be consistent with (3). That is a strong argument in favor. Does anyone see any arguments against this change? ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] 2003-01-22 10:00 ` Richard Stallman @ 2003-01-22 14:12 ` Stefan Monnier 0 siblings, 0 replies; 49+ messages in thread From: Stefan Monnier @ 2003-01-22 14:12 UTC (permalink / raw) Cc: Kenichi Handa > (1) Reading a process output by raw-text into a multibyte > buffer does AS conversion. I think this should do TO > conversion to be consistent with (3). > > That is a strong argument in favor. 100% agreement: AS corresponds to the emacs-mule coding-system, whereas raw-text should be like TO. Stefan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28 2003-01-18 0:48 ` Emacs-diffs Digest, Vol 2, Issue 28 Richard Stallman 2003-01-18 12:35 ` Kim F. Storm 2003-01-18 12:40 ` Eli Zaretskii @ 2003-01-20 1:52 ` Kenichi Handa 2003-01-21 18:18 ` Richard Stallman 2003-01-21 18:18 ` Richard Stallman 2 siblings, 2 replies; 49+ messages in thread From: Kenichi Handa @ 2003-01-20 1:52 UTC (permalink / raw) Cc: emacs-devel In article <E18Zh9W-00012L-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes: > ! The string argument is normally a multibyte string, except: > ! - if the process' input coding system is no-conversion or raw-text, > ! it is a unibyte string (the non-converted input), or else > Is this really the right way for it to work? > Should the choice of unibyte or multibyte string > be tied in this way to the choice of coding system? This facility was added upon someone's request or to fix some problem long ago. 1998-12-21 Kenichi Handa <handa@etl.go.jp> [...] * process.c (read_process_output): Decide the multibyteness of string given to a process filter by a coding system used for decoding the process output. But, I don't remeber the detail now. > If you want multibyte strings "without decoding", would emacs-mule > give you that? It depends on what kind of multibyte string we want. If we want the same result as reading a file containing the same byte sequence by emacs-mule, emacs-mule is fine. This is the same as reading by no-converson, and insert the given unibyte string by: (insert (string-as-multibyte UNIBYTE-STRING)). If we want a multibyte sequence that is the same as the result of converting each of the original bytes by unibyte-char-to-multibyte, we must read by no-conversion, and insert the given unibyte string just by `insert': (insert UNIBYTE-STRING) This is the same as doing: (insert (string-make-multibyte UNIBYTE-STRING)) If we want a multibyte sequence but each character contained is one of ascii, eight-bit-control, and eight-bit-graphic corresponding to the original bytes, we must read by no-conversion, and insert characters one by one as below: (apply 'insert (string-to-list UNIBYTE-STRING)) Perhaps, we must have a function, say, string-to-multibyte, and make this enable. (insert (string-to-multibyte UNIBYTE-STRING)) --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28 2003-01-20 1:52 ` Emacs-diffs Digest, Vol 2, Issue 28 Kenichi Handa @ 2003-01-21 18:18 ` Richard Stallman 2003-01-28 0:32 ` Kenichi Handa 2003-01-21 18:18 ` Richard Stallman 1 sibling, 1 reply; 49+ messages in thread From: Richard Stallman @ 2003-01-21 18:18 UTC (permalink / raw) Cc: emacs-devel [...] * process.c (read_process_output): Decide the multibyteness of string given to a process filter by a coding system used for decoding the process output. But, I don't remeber the detail now. set-process-multibyte should be a good solution. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28 2003-01-21 18:18 ` Richard Stallman @ 2003-01-28 0:32 ` Kenichi Handa 2003-01-28 12:35 ` Kim F. Storm 2003-03-03 18:59 ` Emacs-diffs Digest, Vol 2, Issue 28 Richard Stallman 0 siblings, 2 replies; 49+ messages in thread From: Kenichi Handa @ 2003-01-28 0:32 UTC (permalink / raw) Cc: emacs-devel In article <E18b2yf-0007yS-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes: > [...] > * process.c (read_process_output): Decide the multibyteness of > string given to a process filter by a coding system used for > decoding the process output. > But, I don't remeber the detail now. > set-process-multibyte should be a good solution. I think we have extrapolated the behaviour of process I/O from that of file I/O. But set-process-multibyte breaks it. It's something like declaring the multibyteness of a process but we don't declare that of a file. So it's not easy to extrapolate the exact semantics of set-process-multibyte. It is fairly clear for the case that a process has a fileter and coding system is no-conversion or raw-text. But what to do with the other coding systems (e.g. Latin-1, iso-2022-jp). If a process is unibyte, what kind of string, the filter should get? Should we suppress a text decoding like the case of inserting a file into a unibyte buffer? And if the process has no filter but has a buffer, what to do for a unibyte process that has a multibyte buffer, or for a multibyte process that has a unibyte buffer? And what to do with process-send-string/region. I think it is better to keep extrapolating the behaviour of process reading/sending from file reading/writing. For inserting a proecss output in a buffer, there's no difficulty to extrapolate the behaviour. For a filter, although we don't have a function something like string-from-file, the most resembling code will be this. (with-temp-buffer (insert-file-contents FILE) (buffer-string)) A string given to a process filter must be the same as the result of that code, which means that default-enable-multibyte-characters decides the multibyteness, and if it is nil, character conversion except for end-of-line conversion is suppressed. The only question is when to check default-enable-multibyte-characters. When a process is created, or just before calling a filter? I think the former is more like file I/O. And it may be ok to have a function set-process-filter-multibyte which can change the multibyteness of a string to a filter on the way. Or, was the intention of set-process-multibyte actually set-process-filter-multibyte? --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28 2003-01-28 0:32 ` Kenichi Handa @ 2003-01-28 12:35 ` Kim F. Storm 2003-02-10 8:15 ` set-process-filter-multibyte and etc Kenichi Handa 2003-03-03 18:59 ` Emacs-diffs Digest, Vol 2, Issue 28 Richard Stallman 1 sibling, 1 reply; 49+ messages in thread From: Kim F. Storm @ 2003-01-28 12:35 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: > I think it is better to keep extrapolating the behaviour of > process reading/sending from file reading/writing. I agree. > For a filter, although we don't have a function something > like string-from-file, the most resembling code will be > this. > > (with-temp-buffer > (insert-file-contents FILE) > (buffer-string)) > > A string given to a process filter must be the same as the > result of that code, which means that > default-enable-multibyte-characters decides the > multibyteness, and if it is nil, character conversion except > for end-of-line conversion is suppressed. > > The only question is when to check > default-enable-multibyte-characters. When a process is > created, or just before calling a filter? I think the > former is more like file I/O. And it may be ok to have a > function set-process-filter-multibyte which can change the > multibyteness of a string to a filter on the way. Good points. Also, there could be a new `:multibyte BOOL' argument to make-network-process to initialize the filter multibyteness of the new process; specifying this would override the setting of default-enable-multibyte-characters. > > Or, was the intention of set-process-multibyte actually > set-process-filter-multibyte? At least, that was the problem I was looking at when I suggested it, so yes. -- Kim F. Storm <storm@cua.dk> http://www.cua.dk ^ permalink raw reply [flat|nested] 49+ messages in thread
* set-process-filter-multibyte and etc. 2003-01-28 12:35 ` Kim F. Storm @ 2003-02-10 8:15 ` Kenichi Handa 2003-02-10 14:57 ` Kim F. Storm 2003-02-20 1:27 ` Tak Ota 0 siblings, 2 replies; 49+ messages in thread From: Kenichi Handa @ 2003-02-10 8:15 UTC (permalink / raw) Cc: emacs-devel In article <5xsmvdbgf7.fsf@kfs2.cua.dk>, storm@cua.dk (Kim F. Storm) writes: >> The only question is when to check >> default-enable-multibyte-characters. When a process is >> created, or just before calling a filter? I think the >> former is more like file I/O. And it may be ok to have a >> function set-process-filter-multibyte which can change the >> multibyteness of a string to a filter on the way. > Good points. Also, there could be a new `:multibyte BOOL' argument to > make-network-process to initialize the filter multibyteness of the new > process; specifying this would override the setting of > default-enable-multibyte-characters. >> Or, was the intention of set-process-multibyte actually >> set-process-filter-multibyte? > At least, that was the problem I was looking at when I suggested it, > so yes. I've just installed changes for set-process-filter-multibyte and etc. I added the followings to etc/NEWS. Could people please fix my English. --- Ken'ichi HANDA handa@m17n.org ** New function `set-process-filter-multibyte' sets the multibyteness of a string given to a process's filter. ** New function `process-filter-multibyte-p' returns t if a string given to a process's filter is multibyte. ** A filter function of a process is called with a multibyte string if the filter's multibyteness is t. That multibyteness is decided by the value of `default-enable-multibyte-characters' when the process is created and can be changed later by `set-process-filter-multibyte'. ** If a process's coding system is raw-text or no-conversion and its buffer is multibyte, the output of the process is at first converted to multibyte by `string-to-multibyte' then inserted in the buffer. Previously, it was converted to multibyte by `string-as-multibyte', which was not compatible with the behaviour of file reading. ** New function `string-to-multibyte' converts a unibyte string to a multibyte string with the same individual character codes. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: set-process-filter-multibyte and etc. 2003-02-10 8:15 ` set-process-filter-multibyte and etc Kenichi Handa @ 2003-02-10 14:57 ` Kim F. Storm 2003-02-11 0:15 ` Kenichi Handa 2003-02-20 1:27 ` Tak Ota 1 sibling, 1 reply; 49+ messages in thread From: Kim F. Storm @ 2003-02-10 14:57 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: > In article <5xsmvdbgf7.fsf@kfs2.cua.dk>, storm@cua.dk (Kim F. Storm) writes: > >> The only question is when to check > >> default-enable-multibyte-characters. When a process is > >> created, or just before calling a filter? I think the > >> former is more like file I/O. And it may be ok to have a > >> function set-process-filter-multibyte which can change the > >> multibyteness of a string to a filter on the way. > > > Good points. Also, there could be a new `:multibyte BOOL' argument to > > make-network-process to initialize the filter multibyteness of the new > > process; specifying this would override the setting of > > default-enable-multibyte-characters. > > >> Or, was the intention of set-process-multibyte actually > >> set-process-filter-multibyte? > > > At least, that was the problem I was looking at when I suggested it, > > so yes. > > I've just installed changes for set-process-filter-multibyte > and etc. Great. I've fixed a few typos and doc strings. If a process buffer is specified to start-process or make-network-process, it would seem logical to assume that the process filter will eventually insert the string into that buffer. So I wonder whether it would make sense to let the default filter multibyteness depend on the multibyteness of the BUFFER argument to start-process and make-network-process (if specified and non-nil)? And only revert to default-enable-multibyte-characters if BUFFER is nil. -- Kim F. Storm <storm@cua.dk> http://www.cua.dk ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: set-process-filter-multibyte and etc. 2003-02-10 14:57 ` Kim F. Storm @ 2003-02-11 0:15 ` Kenichi Handa 0 siblings, 0 replies; 49+ messages in thread From: Kenichi Handa @ 2003-02-11 0:15 UTC (permalink / raw) Cc: emacs-devel In article <5xwuk8che8.fsf@kfs2.cua.dk>, storm@cua.dk (Kim F. Storm) writes: > Kenichi Handa <handa@m17n.org> writes: >> In article <5xsmvdbgf7.fsf@kfs2.cua.dk>, storm@cua.dk (Kim F. Storm) writes: >> >> The only question is when to check >> >> default-enable-multibyte-characters. When a process is >> >> created, or just before calling a filter? I think the >> >> former is more like file I/O. And it may be ok to have a >> >> function set-process-filter-multibyte which can change the >> >> multibyteness of a string to a filter on the way. >> >> > Good points. Also, there could be a new `:multibyte BOOL' argument to >> > make-network-process to initialize the filter multibyteness of the new >> > process; specifying this would override the setting of >> > default-enable-multibyte-characters. >> >> >> Or, was the intention of set-process-multibyte actually >> >> set-process-filter-multibyte? >> >> > At least, that was the problem I was looking at when I suggested it, >> > so yes. >> >> I've just installed changes for set-process-filter-multibyte >> and etc. > Great. I've fixed a few typos and doc strings. Thank you. > If a process buffer is specified to start-process or make-network-process, > it would seem logical to assume that the process filter will > eventually insert the string into that buffer. > So I wonder whether it would make sense to let the default filter > multibyteness depend on the multibyteness of the BUFFER argument to > start-process and make-network-process (if specified and non-nil)? > And only revert to default-enable-multibyte-characters if BUFFER is nil. That sounds reasonable. If there's no objection, I'll change the code. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: set-process-filter-multibyte and etc. 2003-02-10 8:15 ` set-process-filter-multibyte and etc Kenichi Handa 2003-02-10 14:57 ` Kim F. Storm @ 2003-02-20 1:27 ` Tak Ota 2003-02-20 1:56 ` Kenichi Handa 1 sibling, 1 reply; 49+ messages in thread From: Tak Ota @ 2003-02-20 1:27 UTC (permalink / raw) Cc: emacs-devel 2003-02-10 Kenichi Handa <handa@m17n.org> * process.c (QCfilter_multibyte): New variable. (setup_process_coding_systems): New function. setup_process_coding_systems should check the validity of inch and outch since they can be (-1) when the buffer is killed and setup_process_coding_systems is called downstream of exec_sentinel. Here is the actual case I encountered. EMACS! memset + 65 bytes setup_coding_system(int 0x114f893c, coding_system * 0x11377064) line 3401 + 22 bytes setup_process_coding_systems(int 0x41d41780) line 605 + 23 bytes Fset_process_buffer(int 0x41d41780, int 0x11342404) line 855 + 9 bytes Ffuncall(int 0x00000003, int * 0x0082f448) line 2744 + 25 bytes Fbyte_code(int 0x31bd5834, int 0x41beb580, int 0x00000004) line 709 + 16 bytes funcall_lambda(int 0x41b7a440, int 0x00000002, int * 0x0082f684) line 2929 + 43 bytes Ffuncall(int 0x00000003, int * 0x0082f680) line 2788 + 20 bytes Fapply(int 0x00000002, int * 0x0082f6cc) line 2247 + 13 bytes apply1(int 0x11b30894, int 0x51c8d95c) line 2500 + 11 bytes read_process_output_call() line 4374 + 29 bytes internal_condition_case_1(int (void)* 0x01050bfd read_process_output_call(void), int 0x51c8d954, int 0x1135c214, int (void)* 0x01052894 exec_sentinel_error_handler(void)) line 1392 + 7 bytes exec_sentinel() line 5971 + 98 bytes status_notify() line 6068 + 13 bytes wait_reading_process_input(int 0x0000001e, int 0x00000000, int 0x0fffffff, int 0x00000001) line 3939 sit_for(int 0x0000001e, int 0x00000000, int 0x00000001, int 0x00000001, int 0x00000000) line 6249 + 21 bytes read_char(int 0x00000001, int 0x00000003, int * 0x0082fa7c, int 0x11342404, int * 0x0082fbdc) line 2681 + 36 bytes read_key_sequence(int * 0x0082fd40, int 0x0000001e, int 0x11342404, int 0x00000000, int 0x00000001, int 0x00000001) line 8566 + 41 bytes command_loop_1() line 1492 + 27 bytes internal_condition_case(int (void)* 0x0101409f command_loop_1(void), int 0x1135c214, int (void)* 0x01013c71 cmd_error(void)) line 1351 + 3 bytes command_loop_2() line 1290 + 21 bytes internal_catch(int 0x113517c4, int (void)* 0x01013f67 command_loop_2(void), int 0x11342404) line 1112 + 7 bytes command_loop() line 1269 + 23 bytes recursive_edit_1() line 985 + 5 bytes Frecursive_edit() line 1042 main() line 1661 EMACS! mainCRTStartup + 180 bytes _start() line 136 KERNEL32! 77ea847c() -Tak Mon, 10 Feb 2003 17:15:16 +0900 (JST): Kenichi Handa <handa@m17n.org> wrote: > In article <5xsmvdbgf7.fsf@kfs2.cua.dk>, storm@cua.dk (Kim F. Storm) writes: > >> The only question is when to check > >> default-enable-multibyte-characters. When a process is > >> created, or just before calling a filter? I think the > >> former is more like file I/O. And it may be ok to have a > >> function set-process-filter-multibyte which can change the > >> multibyteness of a string to a filter on the way. > > > Good points. Also, there could be a new `:multibyte BOOL' argument to > > make-network-process to initialize the filter multibyteness of the new > > process; specifying this would override the setting of > > default-enable-multibyte-characters. > > >> Or, was the intention of set-process-multibyte actually > >> set-process-filter-multibyte? > > > At least, that was the problem I was looking at when I suggested it, > > so yes. > > I've just installed changes for set-process-filter-multibyte > and etc. I added the followings to etc/NEWS. Could people > please fix my English. > > --- > Ken'ichi HANDA > handa@m17n.org > > ** New function `set-process-filter-multibyte' sets the multibyteness > of a string given to a process's filter. > > ** New function `process-filter-multibyte-p' returns t if > a string given to a process's filter is multibyte. > > ** A filter function of a process is called with a multibyte string if > the filter's multibyteness is t. That multibyteness is decided by the > value of `default-enable-multibyte-characters' when the process is > created and can be changed later by `set-process-filter-multibyte'. > > ** If a process's coding system is raw-text or no-conversion and its > buffer is multibyte, the output of the process is at first converted > to multibyte by `string-to-multibyte' then inserted in the buffer. > Previously, it was converted to multibyte by `string-as-multibyte', > which was not compatible with the behaviour of file reading. > > ** New function `string-to-multibyte' converts a unibyte string to a > multibyte string with the same individual character codes. > > > > _______________________________________________ > Emacs-devel mailing list > Emacs-devel@gnu.org > http://mail.gnu.org/mailman/listinfo/emacs-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: set-process-filter-multibyte and etc. 2003-02-20 1:27 ` Tak Ota @ 2003-02-20 1:56 ` Kenichi Handa 2003-02-20 2:44 ` Tak Ota 0 siblings, 1 reply; 49+ messages in thread From: Kenichi Handa @ 2003-02-20 1:56 UTC (permalink / raw) Cc: emacs-devel In article <20030219.172710.60851987.Takaaki.Ota@am.sony.com>, Tak Ota <Takaaki.Ota@am.sony.com> writes: > setup_process_coding_systems should check the validity of inch and > outch since they can be (-1) when the buffer is killed and > setup_process_coding_systems is called downstream of exec_sentinel. Thank you for the bug report. I've just installed the attached change. Does it fix the problem? --- Ken'ichi HANDA handa@m17n.org 2003-02-20 Kenichi Handa <handa@m17n.org> * process.c (setup_process_coding_systems): If the process's in/out descriptor is -1, do nothing. Index: process.c =================================================================== RCS file: /cvsroot/emacs/emacs/src/process.c,v retrieving revision 1.399 retrieving revision 1.400 diff -u -c -r1.399 -r1.400 cvs server: conflicting specifications of output style *** process.c 10 Feb 2003 13:51:43 -0000 1.399 --- process.c 20 Feb 2003 01:54:09 -0000 1.400 *************** *** 598,603 **** --- 598,606 ---- int inch = XINT (p->infd); int outch = XINT (p->outfd); + if (inch < 0 || outch < 0) + return; + if (!proc_decode_coding_system[inch]) proc_decode_coding_system[inch] = (struct coding_system *) xmalloc (sizeof (struct coding_system)); ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: set-process-filter-multibyte and etc. 2003-02-20 1:56 ` Kenichi Handa @ 2003-02-20 2:44 ` Tak Ota 0 siblings, 0 replies; 49+ messages in thread From: Tak Ota @ 2003-02-20 2:44 UTC (permalink / raw) Cc: emacs-devel Thu, 20 Feb 2003 10:56:04 +0900 (JST): Kenichi Handa <handa@m17n.org> wrote: > In article <20030219.172710.60851987.Takaaki.Ota@am.sony.com>, Tak Ota <Takaaki.Ota@am.sony.com> writes: > > setup_process_coding_systems should check the validity of inch and > > outch since they can be (-1) when the buffer is killed and > > setup_process_coding_systems is called downstream of exec_sentinel. > > Thank you for the bug report. I've just installed the > attached change. Does it fix the problem? Yes it does. Thank you. -Tak ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28 2003-01-28 0:32 ` Kenichi Handa 2003-01-28 12:35 ` Kim F. Storm @ 2003-03-03 18:59 ` Richard Stallman 1 sibling, 0 replies; 49+ messages in thread From: Richard Stallman @ 2003-03-03 18:59 UTC (permalink / raw) Cc: emacs-devel It is fairly clear for the case that a process has a fileter and coding system is no-conversion or raw-text. But what to do with the other coding systems (e.g. Latin-1, iso-2022-jp). If a process is unibyte, what kind of string, the filter should get? Should we suppress a text decoding like the case of inserting a file into a unibyte buffer? I don't see any other meaningful thing to do in that case. Decoding produces, in general, data that cannot go in a unibyte string. And if the process has no filter but has a buffer, what to do for a unibyte process that has a multibyte buffer, or for a multibyte process that has a unibyte buffer? These are unreasonable cases, so there is no need to strain to hard to make them do useful things. For the unibyte process with a multibyte buffer, just turn off decoding, and let the data get converted to multibyte when inserted in the buffer. For the multibyte process with unibyte buffer, it can convert the multibyte decoded text to unibyte. If people feel that isn't useful, they shouldn't use it. And what to do with process-send-string/region. If the process is multibyte, encode the text (after converting it first to multibyte if necessary). If the process is unibyte, just convert the text to unibyte. A string given to a process filter must be the same as the result of that code, which means that default-enable-multibyte-characters decides the multibyteness, and if it is nil, character conversion except for end-of-line conversion is suppressed. default-enable-multibyte-characters should not directly control the multibyteness of the string. The process's multibyte flag should control that. But it is a good idea to let default-enable-multibyte-characters play a role in initializing the process's multibyte flag. Or, was the intention of set-process-multibyte actually set-process-filter-multibyte? You could think of it that way, because the case where set-process-multibyte is useful is when the process has a filter. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28 2003-01-20 1:52 ` Emacs-diffs Digest, Vol 2, Issue 28 Kenichi Handa 2003-01-21 18:18 ` Richard Stallman @ 2003-01-21 18:18 ` Richard Stallman 2003-01-27 12:20 ` Kenichi Handa 1 sibling, 1 reply; 49+ messages in thread From: Richard Stallman @ 2003-01-21 18:18 UTC (permalink / raw) Cc: emacs-devel If we want a multibyte sequence but each character contained is one of ascii, eight-bit-control, and eight-bit-graphic corresponding to the original bytes, we must read by no-conversion, and insert characters one by one as below: (apply 'insert (string-to-list UNIBYTE-STRING)) Perhaps, we must have a function, say, string-to-multibyte, and make this enable. (insert (string-to-multibyte UNIBYTE-STRING)) I think we do need this function string-to-multibyte, and its doc string should carefully explain how it differs from string-make-multibyte. I don't remember why the current code does as above. I think the behaviour what Eli wrote is more consistent with the behaviour of file reading. Shall I change the code as what Eli wrote (by introducing the new function string-to-multibyte)? Please do introduce string-to-multibyte. What other change do you propose? I am not sure how what Eli wrote differs from the current behavior. By the way, it may be clean to have all these functions in parallel, and spare one section describing the difference of MAKE, AS, TO conversions in info. string-make-multibyte string-as-multibyte string-to-multibyte These three are all useful. string-make-unibyte string-as-unibyte string-to-unibyte (perpaps the same as string-as-unibyte, or it should signal an error if non-ascii, non-eight-bit-XXX is contained). I don't see a need to add string-to-unibyte. buffer-make-multibyte buffer-as-multibyte (same as (set-buffer-multibyte BUFFER t)) buffer-to-multibyte I don't think buffer-make-multibyte and buffer-to-multibyte are useful. What is useful is to have functions to operate on a region in a multibyte buffer, transforming the text between these three different representations. (Some of the 6 transformations may be meaningless or impossible; we should only support the meaningful ones.) buffer-make-unibyte buffer-as-unibyte (same as (set-buffer-multibyte BUFFER nil)) buffer-to-nuibyte I don't think any change here is worth making. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28 2003-01-21 18:18 ` Richard Stallman @ 2003-01-27 12:20 ` Kenichi Handa 2003-01-29 0:05 ` Richard Stallman 0 siblings, 1 reply; 49+ messages in thread From: Kenichi Handa @ 2003-01-27 12:20 UTC (permalink / raw) Cc: emacs-devel In article <E18b2yg-0007yl-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes: > string-make-multibyte > string-as-multibyte > string-to-multibyte > These three are all useful. > string-make-unibyte > string-as-unibyte > string-to-unibyte (perpaps the same as string-as-unibyte, or > it should signal an error if non-ascii, > non-eight-bit-XXX is contained). > I don't see a need to add string-to-unibyte. We have string-as/make-multibyte and string-as/make-unibyte. If one finds string-to-multibyte, it's quite natural that he also expects string-to-unibyte. Even if it is an alias of string-as-unibyte, I think it's worth having it. And, it's simpler to have it than saying that we don't have string-to-unibyte because ... in some place. And I think it's better that it signals an error as written above. > buffer-make-multibyte > buffer-as-multibyte (same as (set-buffer-multibyte BUFFER t)) > buffer-to-multibyte > I don't think buffer-make-multibyte and buffer-to-multibyte are > useful. What is useful is to have functions to operate on a region in > a multibyte buffer, transforming the text between these three > different representations. (Some of the 6 transformations may be > meaningless or impossible; we should only support the meaningful > ones.) I don't agree with having such function. I think such a case is where we have to use decode/encode-coding-region. Eight-bit chars in a multibyte buffer actually represent raw-bytes. Then the operation of turing them to characters is "decoding", not transforming. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Emacs-diffs Digest, Vol 2, Issue 28 2003-01-27 12:20 ` Kenichi Handa @ 2003-01-29 0:05 ` Richard Stallman 0 siblings, 0 replies; 49+ messages in thread From: Richard Stallman @ 2003-01-29 0:05 UTC (permalink / raw) Cc: emacs-devel > I don't see a need to add string-to-unibyte. We have string-as/make-multibyte and string-as/make-unibyte. If one finds string-to-multibyte, it's quite natural that he also expects string-to-unibyte. Even if it is an alias of string-as-unibyte, I think it's worth having it. And, it's simpler to have it than saying that we don't have string-to-unibyte because ... in some place. I don't think we should add functions merely for completeness. ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2003-03-03 18:59 UTC | newest] Thread overview: 49+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <E18ZDQC-0003mt-02@monty-python.gnu.org> 2003-01-18 0:48 ` Emacs-diffs Digest, Vol 2, Issue 28 Richard Stallman 2003-01-18 12:35 ` Kim F. Storm 2003-01-18 12:40 ` Eli Zaretskii 2003-01-20 0:49 ` Richard Stallman 2003-01-20 2:29 ` unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] Kenichi Handa 2003-01-20 18:48 ` Eli Zaretskii 2003-01-20 20:55 ` Stefan Monnier 2003-01-21 0:20 ` Kenichi Handa 2003-01-21 0:54 ` Stefan Monnier 2003-01-21 5:57 ` Eli Zaretskii 2003-01-22 9:59 ` Richard Stallman 2003-01-22 14:12 ` Stefan Monnier 2003-01-22 18:09 ` Eli Zaretskii 2003-01-23 11:38 ` Richard Stallman 2003-01-23 16:18 ` Stefan Monnier 2003-01-24 17:16 ` Richard Stallman 2003-01-23 17:48 ` Eli Zaretskii 2003-01-24 5:43 ` Richard Stallman 2003-01-26 1:30 ` Stefan Monnier 2003-01-27 2:31 ` Richard Stallman 2003-01-27 7:38 ` Kenichi Handa 2003-01-27 14:12 ` Stefan Monnier 2003-01-29 11:23 ` Kenichi Handa 2003-01-21 0:10 ` Kenichi Handa 2003-01-21 0:45 ` Stefan Monnier 2003-01-21 6:01 ` Eli Zaretskii 2003-01-21 6:43 ` Kenichi Handa 2003-01-21 8:04 ` Kenichi Handa 2003-01-21 15:02 ` Miles Bader 2003-01-21 17:44 ` Stefan Monnier 2003-01-22 10:00 ` Richard Stallman 2003-01-21 5:56 ` Eli Zaretskii 2003-01-21 6:38 ` Kenichi Handa 2003-01-22 10:00 ` Richard Stallman 2003-01-22 14:12 ` Stefan Monnier 2003-01-20 1:52 ` Emacs-diffs Digest, Vol 2, Issue 28 Kenichi Handa 2003-01-21 18:18 ` Richard Stallman 2003-01-28 0:32 ` Kenichi Handa 2003-01-28 12:35 ` Kim F. Storm 2003-02-10 8:15 ` set-process-filter-multibyte and etc Kenichi Handa 2003-02-10 14:57 ` Kim F. Storm 2003-02-11 0:15 ` Kenichi Handa 2003-02-20 1:27 ` Tak Ota 2003-02-20 1:56 ` Kenichi Handa 2003-02-20 2:44 ` Tak Ota 2003-03-03 18:59 ` Emacs-diffs Digest, Vol 2, Issue 28 Richard Stallman 2003-01-21 18:18 ` Richard Stallman 2003-01-27 12:20 ` Kenichi Handa 2003-01-29 0:05 ` Richard Stallman
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.