* Replacement for string-as-unibyte-function
@ 2021-01-31 23:01 Joe Riel
2021-02-01 3:03 ` Stefan Monnier
0 siblings, 1 reply; 12+ messages in thread
From: Joe Riel @ 2021-01-31 23:01 UTC (permalink / raw)
To: help-gnu-emacs
Apologies for this partial duplicate; after posting realized I no longer
receive email from this group, so renewed. Am copying and responding to response
from the website:
Eli Zaretskii asks:
> Please describe your use case: what are you trying to do that you
> needed string-as-unibyte?
Am handling a message passed in from an external process
(its passed in chunks).
The header of the message specifies its length, in bytes.
Some of the characters may be unicode. Am using
buffer-substring-no-properties to later extract the message.
To get its length write, each byte has to be a character in the buffer.
It appears as though
(encode-coding-string string 'utf-8-unix)
is the equivalent of (string-as-unibyte string).
--
Joe Riel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function
2021-01-31 23:01 Replacement for string-as-unibyte-function Joe Riel
@ 2021-02-01 3:03 ` Stefan Monnier
2021-02-01 5:55 ` Joe Riel
0 siblings, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2021-02-01 3:03 UTC (permalink / raw)
To: help-gnu-emacs
> Am handling a message passed in from an external process
> (its passed in chunks).
>
> The header of the message specifies its length, in bytes.
> Some of the characters may be unicode. Am using
> buffer-substring-no-properties to later extract the message.
> To get its length write, each byte has to be a character in the buffer.
So make sure the buffer in which the process writes is unibyte with
(set-buffer-multibyte nil)
and make sure Emacs doesn't try to decode the process's output:
(set-process-coding-system <proc> 'binary)
(which you can also set directly when you launch the process, but how
you do it depends on the function you use to create the process).
Stefan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function
2021-02-01 3:03 ` Stefan Monnier
@ 2021-02-01 5:55 ` Joe Riel
2021-02-01 14:53 ` Eli Zaretskii
2021-02-01 15:01 ` Stefan Monnier
0 siblings, 2 replies; 12+ messages in thread
From: Joe Riel @ 2021-02-01 5:55 UTC (permalink / raw)
To: Stefan Monnier; +Cc: help-gnu-emacs
On Sun, 31 Jan 2021 22:03:42 -0500
Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> > Am handling a message passed in from an external process
> > (its passed in chunks).
> >
> > The header of the message specifies its length, in bytes.
> > Some of the characters may be unicode. Am using
> > buffer-substring-no-properties to later extract the message.
> > To get its length write, each byte has to be a character in the buffer.
>
> So make sure the buffer in which the process writes is unibyte with
>
> (set-buffer-multibyte nil)
>
> and make sure Emacs doesn't try to decode the process's output:
>
> (set-process-coding-system <proc> 'binary)
>
> (which you can also set directly when you launch the process, but how
> you do it depends on the function you use to create the process).
I'm actually using make-network-process (to communicate via tls).
The filter function inserts the string into a buffer.
I tried using (set-buffer-multibyte nil) and (insert string),
but that doesn't work. What does work is omitting the call to
set-buffer-multibyte and using
(insert (encode-coding-string string 'utf-8-unix)).
Previously I used (insert (string-as-unibyte string)).
--
Joe Riel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function
2021-02-01 5:55 ` Joe Riel
@ 2021-02-01 14:53 ` Eli Zaretskii
2021-02-01 16:33 ` Joe Riel
2021-02-01 15:01 ` Stefan Monnier
1 sibling, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2021-02-01 14:53 UTC (permalink / raw)
To: help-gnu-emacs
> Date: Sun, 31 Jan 2021 21:55:55 -0800
> From: Joe Riel <joer@san.rr.com>
> Cc: help-gnu-emacs@gnu.org
>
> > (set-process-coding-system <proc> 'binary)
> >
> > (which you can also set directly when you launch the process, but how
> > you do it depends on the function you use to create the process).
>
> I'm actually using make-network-process (to communicate via tls).
make-network-process accepts the :coding attribute, which you could
use instead of what Stefan suggests above.
> I tried using (set-buffer-multibyte nil) and (insert string),
> but that doesn't work.
Please show how you tried that. The effect could depend on the
details and the timing of that call.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function
2021-02-01 14:53 ` Eli Zaretskii
@ 2021-02-01 16:33 ` Joe Riel
2021-02-01 17:05 ` Eli Zaretskii
0 siblings, 1 reply; 12+ messages in thread
From: Joe Riel @ 2021-02-01 16:33 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: help-gnu-emacs
On Mon, 01 Feb 2021 16:53:32 +0200
Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Sun, 31 Jan 2021 21:55:55 -0800
> > From: Joe Riel <joer@san.rr.com>
> > Cc: help-gnu-emacs@gnu.org
> >
> > > (set-process-coding-system <proc> 'binary)
> > >
> > > (which you can also set directly when you launch the process, but how
> > > you do it depends on the function you use to create the process).
> >
> > I'm actually using make-network-process (to communicate via tls).
>
> make-network-process accepts the :coding attribute, which you could
> use instead of what Stefan suggests above.
>
> > I tried using (set-buffer-multibyte nil) and (insert string),
> > but that doesn't work.
>
> Please show how you tried that. The effect could depend on the
> details and the timing of that call.
>
I'm using the :filter option, not the :buffer option, in make-network-process.
(make-network-process
:name "mds"
:family 'ipv4
:service mds-port
:sentinel 'mds-sentinel
:filter 'mds-filter
:server 't)
That is done because the server handles multiple clients, so the filter
function routes the data to the appropriate buffer. It isn't clear to
me whether using :coding then has an effect; I haven't seen it.
I tried setting up each client buffer with
(with-current-buffer buf (set-buffer-multibyte nil))
and, in the filter function, just calling
(insert string)
but, as mentioned that doesn't do the same as skipping the call to set-buffer-multibyte
and doing
(insert (encode-coding-string string 'utf-8-unix))
--
Joe Riel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function
2021-02-01 16:33 ` Joe Riel
@ 2021-02-01 17:05 ` Eli Zaretskii
2021-02-01 23:43 ` Joe Riel
0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2021-02-01 17:05 UTC (permalink / raw)
To: help-gnu-emacs
> Date: Mon, 1 Feb 2021 08:33:52 -0800
> From: Joe Riel <jriel@maplesoft.com>
> CC: <help-gnu-emacs@gnu.org>
>
> (make-network-process
> :name "mds"
> :family 'ipv4
> :service mds-port
> :sentinel 'mds-sentinel
> :filter 'mds-filter
> :server 't)
>
> That is done because the server handles multiple clients, so the filter
> function routes the data to the appropriate buffer. It isn't clear to
> me whether using :coding then has an effect; I haven't seen it.
>
> I tried setting up each client buffer with
>
> (with-current-buffer buf (set-buffer-multibyte nil))
>
> and, in the filter function, just calling
>
> (insert string)
With or without the :coding attribute? I guess without, which is why
it didn't work.
Also, make sure the above is run before the filter function is called
the first time.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function
2021-02-01 17:05 ` Eli Zaretskii
@ 2021-02-01 23:43 ` Joe Riel
2021-02-02 3:30 ` Eli Zaretskii
0 siblings, 1 reply; 12+ messages in thread
From: Joe Riel @ 2021-02-01 23:43 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: help-gnu-emacs
On Mon, 01 Feb 2021 19:05:06 +0200
Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Mon, 1 Feb 2021 08:33:52 -0800
> > From: Joe Riel <jriel@maplesoft.com>
> > CC: <help-gnu-emacs@gnu.org>
> >
> > (make-network-process
> > :name "mds"
> > :family 'ipv4
> > :service mds-port
> > :sentinel 'mds-sentinel
> > :filter 'mds-filter
> > :server 't)
> >
> > That is done because the server handles multiple clients, so the filter
> > function routes the data to the appropriate buffer. It isn't clear to
> > me whether using :coding then has an effect; I haven't seen it.
> >
> > I tried setting up each client buffer with
> >
> > (with-current-buffer buf (set-buffer-multibyte nil))
> >
> > and, in the filter function, just calling
> >
> > (insert string)
>
> With or without the :coding attribute? I guess without, which is why
> it didn't work.
I tried it with :coding 'binary (and with other changes mentioned).
Didn't work. That is, it worked fine if the strings being sent
were ascii. But when they contained unicode, the count would no
longer match.
> Also, make sure the above is run before the filter function is called
> the first time.
Ah, that's the trick. Thanks. When I do that it works fine with both
unicode and ascii source. How does the function (or its usage)
change?
--
Joe Riel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function
2021-02-01 23:43 ` Joe Riel
@ 2021-02-02 3:30 ` Eli Zaretskii
2021-02-02 3:51 ` Joe Riel
0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2021-02-02 3:30 UTC (permalink / raw)
To: help-gnu-emacs
> Date: Mon, 1 Feb 2021 15:43:22 -0800
> From: Joe Riel <jriel@maplesoft.com>
> CC: <help-gnu-emacs@gnu.org>
>
> > Also, make sure the above is run before the filter function is called
> > the first time.
>
> Ah, that's the trick. Thanks. When I do that it works fine with both
> unicode and ascii source. How does the function (or its usage)
> change?
I'm sorry, I didn't understand the question. Can you explain what you
are asking?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function
2021-02-02 3:30 ` Eli Zaretskii
@ 2021-02-02 3:51 ` Joe Riel
2021-02-02 14:58 ` Eli Zaretskii
0 siblings, 1 reply; 12+ messages in thread
From: Joe Riel @ 2021-02-02 3:51 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: help-gnu-emacs
On Tue, 02 Feb 2021 05:30:16 +0200
Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Mon, 1 Feb 2021 15:43:22 -0800
> > From: Joe Riel <jriel@maplesoft.com>
> > CC: <help-gnu-emacs@gnu.org>
> >
> > > Also, make sure the above is run before the filter function is called
> > > the first time.
> >
> > Ah, that's the trick. Thanks. When I do that it works fine with both
> > unicode and ascii source. How does the function (or its usage)
> > change?
>
> I'm sorry, I didn't understand the question. Can you explain what you
> are asking?
>
I realized it was unclear after posting. What is different about the filter
function if I re-evaluate it and then rerun the program, so that the filter
function gets called, effectively, for the first time, after the call to
make-network-process? Does the presence of the :coding 'binary option
add advise to the filter function?
--
Joe Riel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function
2021-02-02 3:51 ` Joe Riel
@ 2021-02-02 14:58 ` Eli Zaretskii
0 siblings, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2021-02-02 14:58 UTC (permalink / raw)
To: help-gnu-emacs
> Date: Mon, 1 Feb 2021 19:51:31 -0800
> From: Joe Riel <jriel@maplesoft.com>
> CC: <help-gnu-emacs@gnu.org>
>
> I realized it was unclear after posting. What is different about the filter
> function if I re-evaluate it and then rerun the program, so that the filter
> function gets called, effectively, for the first time, after the call to
> make-network-process? Does the presence of the :coding 'binary option
> add advise to the filter function?
No, what's important is that the first time the filter function runs
and inserts something into the buffer, the buffer is already unibyte.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Replacement for string-as-unibyte-function
2021-02-01 5:55 ` Joe Riel
2021-02-01 14:53 ` Eli Zaretskii
@ 2021-02-01 15:01 ` Stefan Monnier
2021-02-01 16:05 ` <somecodingsystem> (was: Re: Replacement for string-as-unibyte-function) moasenwood--- via Users list for the GNU Emacs text editor
1 sibling, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2021-02-01 15:01 UTC (permalink / raw)
To: Joe Riel; +Cc: help-gnu-emacs
>> So make sure the buffer in which the process writes is unibyte with
>>
>> (set-buffer-multibyte nil)
>>
>> and make sure Emacs doesn't try to decode the process's output:
>>
>> (set-process-coding-system <proc> 'binary)
>>
>> (which you can also set directly when you launch the process, but how
>> you do it depends on the function you use to create the process).
>
> I'm actually using make-network-process (to communicate via tls).
Then use something like
(make-network-process ... :coding 'binary ...)
or
(make-network-process ... :coding '(binary . utf-8) ...)
> The filter function inserts the string into a buffer.
Emacs receives the data from the process as a sequence of *bytes* (after
all, that's the only thing available in POSIX communication). So in
order to pass a sequence of *chars* (aka "a multibyte string") to the
process filter, Emacs's internal C code has to do the equivalent of
(*de*code-coding-string "thedatareceived" '<somecodingsystem>)
where <somecodingsystem> is the coding system that
`make-network-process` decided to use for that process.
And then you come along and want to call `encode-coding-string` on the
result: better use `:coding` as I suggested above in order to cut the
middle man.
Stefan
^ permalink raw reply [flat|nested] 12+ messages in thread
* <somecodingsystem> (was: Re: Replacement for string-as-unibyte-function)
2021-02-01 15:01 ` Stefan Monnier
@ 2021-02-01 16:05 ` moasenwood--- via Users list for the GNU Emacs text editor
0 siblings, 0 replies; 12+ messages in thread
From: moasenwood--- via Users list for the GNU Emacs text editor @ 2021-02-01 16:05 UTC (permalink / raw)
To: help-gnu-emacs
Stefan Monnier wrote:
> Emacs receives the data from the process as a sequence of
> *bytes* (after all, that's the only thing available in POSIX
> communication). So in order to pass a sequence of *chars*
> (aka "a multibyte string") to the process filter, Emacs's
> internal C code has to do the equivalent of
>
> (*de*code-coding-string "thedatareceived" '<somecodingsystem>)
>
> where <somecodingsystem> is the coding system that
> `make-network-process` decided to use for that process.
Interesting, are we here talking computer communication in the
protocol sense, which is basically the format of the message
(the syntax of the packet, usually it has field with different
metadata, then payload (the actual message), and BTW the
metadata can refer to both the payload (e.g., its length) but
also to the communication itself), furthermore the protocol
stipulates the way messages should be sent between hosts
(e.g., in what order), and last but not least, when syntax and
procedure is covered, what it all means - the semantics.
This is illustrated with Alice and Bob in the Anglo-American
world and Abelard and Héloïse in the Franco-phone world, with
arrows going back and forth between them. (Apparently they
existed, and exchanged letters in the 12th century [1].) (I
use to joke about "Care of Héloïse" when I clean and organize
all my zillion tools and toolboxes. Get it? Instead of "Care
of Kit" :) [2])
This is what I remember from school anyway. Or is it more like
Unix IPC? In that case, what I remember ... err, we all use
IPC every day.
What I _don't_ have a snappy answer for, not from school and
not from everyday life, is <somecodingsystem>? What is
a coding system?
[1] https://en.wikipedia.org/wiki/H%C3%A9lo%C3%AFse
[2] https://dataswamp.org/~incal/tree-house/care-of-heloise.jpg [photo]
--
underground experts united
http://user.it.uu.se/~embe8573
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-02-02 14:58 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-01-31 23:01 Replacement for string-as-unibyte-function Joe Riel
2021-02-01 3:03 ` Stefan Monnier
2021-02-01 5:55 ` Joe Riel
2021-02-01 14:53 ` Eli Zaretskii
2021-02-01 16:33 ` Joe Riel
2021-02-01 17:05 ` Eli Zaretskii
2021-02-01 23:43 ` Joe Riel
2021-02-02 3:30 ` Eli Zaretskii
2021-02-02 3:51 ` Joe Riel
2021-02-02 14:58 ` Eli Zaretskii
2021-02-01 15:01 ` Stefan Monnier
2021-02-01 16:05 ` <somecodingsystem> (was: Re: Replacement for string-as-unibyte-function) moasenwood--- via Users list for the GNU Emacs text editor
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.