* Replacement for string-as-unibyte-function
@ 2021-01-31 23:01 Joe Riel
2021-02-01 3:03 ` Stefan Monnier
0 siblings, 1 reply; 12+ messages in thread
From: Joe Riel @ 2021-01-31 23:01 UTC (permalink / raw)
To: help-gnu-emacs
Apologies for this partial duplicate; after posting realized I no longer
receive email from this group, so renewed. Am copying and responding to response
from the website:
Eli Zaretskii asks:
> Please describe your use case: what are you trying to do that you
> needed string-as-unibyte?
Am handling a message passed in from an external process
(its passed in chunks).
The header of the message specifies its length, in bytes.
Some of the characters may be unicode. Am using
buffer-substring-no-properties to later extract the message.
To get its length write, each byte has to be a character in the buffer.
It appears as though
(encode-coding-string string 'utf-8-unix)
is the equivalent of (string-as-unibyte string).
--
Joe Riel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function 2021-01-31 23:01 Replacement for string-as-unibyte-function Joe Riel @ 2021-02-01 3:03 ` Stefan Monnier 2021-02-01 5:55 ` Joe Riel 0 siblings, 1 reply; 12+ messages in thread From: Stefan Monnier @ 2021-02-01 3:03 UTC (permalink / raw) To: help-gnu-emacs > Am handling a message passed in from an external process > (its passed in chunks). > > The header of the message specifies its length, in bytes. > Some of the characters may be unicode. Am using > buffer-substring-no-properties to later extract the message. > To get its length write, each byte has to be a character in the buffer. So make sure the buffer in which the process writes is unibyte with (set-buffer-multibyte nil) and make sure Emacs doesn't try to decode the process's output: (set-process-coding-system <proc> 'binary) (which you can also set directly when you launch the process, but how you do it depends on the function you use to create the process). Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function 2021-02-01 3:03 ` Stefan Monnier @ 2021-02-01 5:55 ` Joe Riel 2021-02-01 14:53 ` Eli Zaretskii 2021-02-01 15:01 ` Stefan Monnier 0 siblings, 2 replies; 12+ messages in thread From: Joe Riel @ 2021-02-01 5:55 UTC (permalink / raw) To: Stefan Monnier; +Cc: help-gnu-emacs On Sun, 31 Jan 2021 22:03:42 -0500 Stefan Monnier <monnier@iro.umontreal.ca> wrote: > > Am handling a message passed in from an external process > > (its passed in chunks). > > > > The header of the message specifies its length, in bytes. > > Some of the characters may be unicode. Am using > > buffer-substring-no-properties to later extract the message. > > To get its length write, each byte has to be a character in the buffer. > > So make sure the buffer in which the process writes is unibyte with > > (set-buffer-multibyte nil) > > and make sure Emacs doesn't try to decode the process's output: > > (set-process-coding-system <proc> 'binary) > > (which you can also set directly when you launch the process, but how > you do it depends on the function you use to create the process). I'm actually using make-network-process (to communicate via tls). The filter function inserts the string into a buffer. I tried using (set-buffer-multibyte nil) and (insert string), but that doesn't work. What does work is omitting the call to set-buffer-multibyte and using (insert (encode-coding-string string 'utf-8-unix)). Previously I used (insert (string-as-unibyte string)). -- Joe Riel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function 2021-02-01 5:55 ` Joe Riel @ 2021-02-01 14:53 ` Eli Zaretskii 2021-02-01 16:33 ` Joe Riel 2021-02-01 15:01 ` Stefan Monnier 1 sibling, 1 reply; 12+ messages in thread From: Eli Zaretskii @ 2021-02-01 14:53 UTC (permalink / raw) To: help-gnu-emacs > Date: Sun, 31 Jan 2021 21:55:55 -0800 > From: Joe Riel <joer@san.rr.com> > Cc: help-gnu-emacs@gnu.org > > > (set-process-coding-system <proc> 'binary) > > > > (which you can also set directly when you launch the process, but how > > you do it depends on the function you use to create the process). > > I'm actually using make-network-process (to communicate via tls). make-network-process accepts the :coding attribute, which you could use instead of what Stefan suggests above. > I tried using (set-buffer-multibyte nil) and (insert string), > but that doesn't work. Please show how you tried that. The effect could depend on the details and the timing of that call. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function 2021-02-01 14:53 ` Eli Zaretskii @ 2021-02-01 16:33 ` Joe Riel 2021-02-01 17:05 ` Eli Zaretskii 0 siblings, 1 reply; 12+ messages in thread From: Joe Riel @ 2021-02-01 16:33 UTC (permalink / raw) To: Eli Zaretskii; +Cc: help-gnu-emacs On Mon, 01 Feb 2021 16:53:32 +0200 Eli Zaretskii <eliz@gnu.org> wrote: > > Date: Sun, 31 Jan 2021 21:55:55 -0800 > > From: Joe Riel <joer@san.rr.com> > > Cc: help-gnu-emacs@gnu.org > > > > > (set-process-coding-system <proc> 'binary) > > > > > > (which you can also set directly when you launch the process, but how > > > you do it depends on the function you use to create the process). > > > > I'm actually using make-network-process (to communicate via tls). > > make-network-process accepts the :coding attribute, which you could > use instead of what Stefan suggests above. > > > I tried using (set-buffer-multibyte nil) and (insert string), > > but that doesn't work. > > Please show how you tried that. The effect could depend on the > details and the timing of that call. > I'm using the :filter option, not the :buffer option, in make-network-process. (make-network-process :name "mds" :family 'ipv4 :service mds-port :sentinel 'mds-sentinel :filter 'mds-filter :server 't) That is done because the server handles multiple clients, so the filter function routes the data to the appropriate buffer. It isn't clear to me whether using :coding then has an effect; I haven't seen it. I tried setting up each client buffer with (with-current-buffer buf (set-buffer-multibyte nil)) and, in the filter function, just calling (insert string) but, as mentioned that doesn't do the same as skipping the call to set-buffer-multibyte and doing (insert (encode-coding-string string 'utf-8-unix)) -- Joe Riel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function 2021-02-01 16:33 ` Joe Riel @ 2021-02-01 17:05 ` Eli Zaretskii 2021-02-01 23:43 ` Joe Riel 0 siblings, 1 reply; 12+ messages in thread From: Eli Zaretskii @ 2021-02-01 17:05 UTC (permalink / raw) To: help-gnu-emacs > Date: Mon, 1 Feb 2021 08:33:52 -0800 > From: Joe Riel <jriel@maplesoft.com> > CC: <help-gnu-emacs@gnu.org> > > (make-network-process > :name "mds" > :family 'ipv4 > :service mds-port > :sentinel 'mds-sentinel > :filter 'mds-filter > :server 't) > > That is done because the server handles multiple clients, so the filter > function routes the data to the appropriate buffer. It isn't clear to > me whether using :coding then has an effect; I haven't seen it. > > I tried setting up each client buffer with > > (with-current-buffer buf (set-buffer-multibyte nil)) > > and, in the filter function, just calling > > (insert string) With or without the :coding attribute? I guess without, which is why it didn't work. Also, make sure the above is run before the filter function is called the first time. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function 2021-02-01 17:05 ` Eli Zaretskii @ 2021-02-01 23:43 ` Joe Riel 2021-02-02 3:30 ` Eli Zaretskii 0 siblings, 1 reply; 12+ messages in thread From: Joe Riel @ 2021-02-01 23:43 UTC (permalink / raw) To: Eli Zaretskii; +Cc: help-gnu-emacs On Mon, 01 Feb 2021 19:05:06 +0200 Eli Zaretskii <eliz@gnu.org> wrote: > > Date: Mon, 1 Feb 2021 08:33:52 -0800 > > From: Joe Riel <jriel@maplesoft.com> > > CC: <help-gnu-emacs@gnu.org> > > > > (make-network-process > > :name "mds" > > :family 'ipv4 > > :service mds-port > > :sentinel 'mds-sentinel > > :filter 'mds-filter > > :server 't) > > > > That is done because the server handles multiple clients, so the filter > > function routes the data to the appropriate buffer. It isn't clear to > > me whether using :coding then has an effect; I haven't seen it. > > > > I tried setting up each client buffer with > > > > (with-current-buffer buf (set-buffer-multibyte nil)) > > > > and, in the filter function, just calling > > > > (insert string) > > With or without the :coding attribute? I guess without, which is why > it didn't work. I tried it with :coding 'binary (and with other changes mentioned). Didn't work. That is, it worked fine if the strings being sent were ascii. But when they contained unicode, the count would no longer match. > Also, make sure the above is run before the filter function is called > the first time. Ah, that's the trick. Thanks. When I do that it works fine with both unicode and ascii source. How does the function (or its usage) change? -- Joe Riel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function 2021-02-01 23:43 ` Joe Riel @ 2021-02-02 3:30 ` Eli Zaretskii 2021-02-02 3:51 ` Joe Riel 0 siblings, 1 reply; 12+ messages in thread From: Eli Zaretskii @ 2021-02-02 3:30 UTC (permalink / raw) To: help-gnu-emacs > Date: Mon, 1 Feb 2021 15:43:22 -0800 > From: Joe Riel <jriel@maplesoft.com> > CC: <help-gnu-emacs@gnu.org> > > > Also, make sure the above is run before the filter function is called > > the first time. > > Ah, that's the trick. Thanks. When I do that it works fine with both > unicode and ascii source. How does the function (or its usage) > change? I'm sorry, I didn't understand the question. Can you explain what you are asking? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function 2021-02-02 3:30 ` Eli Zaretskii @ 2021-02-02 3:51 ` Joe Riel 2021-02-02 14:58 ` Eli Zaretskii 0 siblings, 1 reply; 12+ messages in thread From: Joe Riel @ 2021-02-02 3:51 UTC (permalink / raw) To: Eli Zaretskii; +Cc: help-gnu-emacs On Tue, 02 Feb 2021 05:30:16 +0200 Eli Zaretskii <eliz@gnu.org> wrote: > > Date: Mon, 1 Feb 2021 15:43:22 -0800 > > From: Joe Riel <jriel@maplesoft.com> > > CC: <help-gnu-emacs@gnu.org> > > > > > Also, make sure the above is run before the filter function is called > > > the first time. > > > > Ah, that's the trick. Thanks. When I do that it works fine with both > > unicode and ascii source. How does the function (or its usage) > > change? > > I'm sorry, I didn't understand the question. Can you explain what you > are asking? > I realized it was unclear after posting. What is different about the filter function if I re-evaluate it and then rerun the program, so that the filter function gets called, effectively, for the first time, after the call to make-network-process? Does the presence of the :coding 'binary option add advise to the filter function? -- Joe Riel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function 2021-02-02 3:51 ` Joe Riel @ 2021-02-02 14:58 ` Eli Zaretskii 0 siblings, 0 replies; 12+ messages in thread From: Eli Zaretskii @ 2021-02-02 14:58 UTC (permalink / raw) To: help-gnu-emacs > Date: Mon, 1 Feb 2021 19:51:31 -0800 > From: Joe Riel <jriel@maplesoft.com> > CC: <help-gnu-emacs@gnu.org> > > I realized it was unclear after posting. What is different about the filter > function if I re-evaluate it and then rerun the program, so that the filter > function gets called, effectively, for the first time, after the call to > make-network-process? Does the presence of the :coding 'binary option > add advise to the filter function? No, what's important is that the first time the filter function runs and inserts something into the buffer, the buffer is already unibyte. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function 2021-02-01 5:55 ` Joe Riel 2021-02-01 14:53 ` Eli Zaretskii @ 2021-02-01 15:01 ` Stefan Monnier 2021-02-01 16:05 ` <somecodingsystem> (was: Re: Replacement for string-as-unibyte-function) moasenwood--- via Users list for the GNU Emacs text editor 1 sibling, 1 reply; 12+ messages in thread From: Stefan Monnier @ 2021-02-01 15:01 UTC (permalink / raw) To: Joe Riel; +Cc: help-gnu-emacs >> So make sure the buffer in which the process writes is unibyte with >> >> (set-buffer-multibyte nil) >> >> and make sure Emacs doesn't try to decode the process's output: >> >> (set-process-coding-system <proc> 'binary) >> >> (which you can also set directly when you launch the process, but how >> you do it depends on the function you use to create the process). > > I'm actually using make-network-process (to communicate via tls). Then use something like (make-network-process ... :coding 'binary ...) or (make-network-process ... :coding '(binary . utf-8) ...) > The filter function inserts the string into a buffer. Emacs receives the data from the process as a sequence of *bytes* (after all, that's the only thing available in POSIX communication). So in order to pass a sequence of *chars* (aka "a multibyte string") to the process filter, Emacs's internal C code has to do the equivalent of (*de*code-coding-string "thedatareceived" '<somecodingsystem>) where <somecodingsystem> is the coding system that `make-network-process` decided to use for that process. And then you come along and want to call `encode-coding-string` on the result: better use `:coding` as I suggested above in order to cut the middle man. Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* <somecodingsystem> (was: Re: Replacement for string-as-unibyte-function) 2021-02-01 15:01 ` Stefan Monnier @ 2021-02-01 16:05 ` moasenwood--- via Users list for the GNU Emacs text editor 0 siblings, 0 replies; 12+ messages in thread From: moasenwood--- via Users list for the GNU Emacs text editor @ 2021-02-01 16:05 UTC (permalink / raw) To: help-gnu-emacs Stefan Monnier wrote: > Emacs receives the data from the process as a sequence of > *bytes* (after all, that's the only thing available in POSIX > communication). So in order to pass a sequence of *chars* > (aka "a multibyte string") to the process filter, Emacs's > internal C code has to do the equivalent of > > (*de*code-coding-string "thedatareceived" '<somecodingsystem>) > > where <somecodingsystem> is the coding system that > `make-network-process` decided to use for that process. Interesting, are we here talking computer communication in the protocol sense, which is basically the format of the message (the syntax of the packet, usually it has field with different metadata, then payload (the actual message), and BTW the metadata can refer to both the payload (e.g., its length) but also to the communication itself), furthermore the protocol stipulates the way messages should be sent between hosts (e.g., in what order), and last but not least, when syntax and procedure is covered, what it all means - the semantics. This is illustrated with Alice and Bob in the Anglo-American world and Abelard and Héloïse in the Franco-phone world, with arrows going back and forth between them. (Apparently they existed, and exchanged letters in the 12th century [1].) (I use to joke about "Care of Héloïse" when I clean and organize all my zillion tools and toolboxes. Get it? Instead of "Care of Kit" :) [2]) This is what I remember from school anyway. Or is it more like Unix IPC? In that case, what I remember ... err, we all use IPC every day. What I _don't_ have a snappy answer for, not from school and not from everyday life, is <somecodingsystem>? What is a coding system? [1] https://en.wikipedia.org/wiki/H%C3%A9lo%C3%AFse [2] https://dataswamp.org/~incal/tree-house/care-of-heloise.jpg [photo] -- underground experts united http://user.it.uu.se/~embe8573 https://dataswamp.org/~incal ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-02-02 14:58 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-01-31 23:01 Replacement for string-as-unibyte-function Joe Riel 2021-02-01 3:03 ` Stefan Monnier 2021-02-01 5:55 ` Joe Riel 2021-02-01 14:53 ` Eli Zaretskii 2021-02-01 16:33 ` Joe Riel 2021-02-01 17:05 ` Eli Zaretskii 2021-02-01 23:43 ` Joe Riel 2021-02-02 3:30 ` Eli Zaretskii 2021-02-02 3:51 ` Joe Riel 2021-02-02 14:58 ` Eli Zaretskii 2021-02-01 15:01 ` Stefan Monnier 2021-02-01 16:05 ` <somecodingsystem> (was: Re: Replacement for string-as-unibyte-function) moasenwood--- via Users list for the GNU Emacs text editor
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).