how to calculate the size of string in bytes?

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* how to calculate the size of string in bytes?
@ 2015-08-18  9:11 Sam Halliday
  2015-08-18 10:13 ` tomas
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Sam Halliday @ 2015-08-18  9:11 UTC (permalink / raw)
  To: help-gnu-emacs

Hi all,

We've had to change the ENSIME protocol to be more friendly to other editors and this has meant changing how we frame TCP messages.

We used to have a 6 character hex number at the start of each message that counted the number of multibyte characters, but we'd like to change it to be the number of bytes in the message.

We're sending the string to `process-send-string' and `read'ing from the associated network buffer. But when calculating the outgoing length of the string that we want to send, we use `length' --- but we need this to be `length-in-bytes' not the number of multibyte chars. Is there a built in function to do this or am I going to have to iterate the string and count the byte size of each character?

A quick test shows that

  (length (encode-coding-string "EURO" 'raw-text))

seems to give the correct result (1 for ASCII, 2 for Pound Sterling, 3 for Euro), but I am not 100% sure if this is correct.

Similarly, when we read from the network, we want to ensure that we `read' numbers of bytes, not multibyte chars. I *think* we are doing the right thing here, but if somebody could check, that would be greatly appreciated.

These are the relevant part of our Emacs code

;; https://github.com/ensime/ensime-emacs/blob/master/ensime-client.el#L507
(defun ensime-net-send (sexp proc)
  (let* ((msg (concat (ensime-prin1-to-string sexp) "\n"))
	 (string (concat (ensime-net-encode-length (length msg)) msg))
	 (coding-system (cdr (process-coding-system proc))))
    (when ensime--debug-messages (message "--> %s" sexp))
    (ensime-log-event sexp)
    (process-send-string proc string)))

;; https://github.com/ensime/ensime-emacs/blob/master/ensime-client.el#L584
(defun ensime-net-read ()
  "Read a message from the network buffer."
  (goto-char (point-min))
  (let* ((length (ensime-net-decode-length))
	 (start (+ 6 (point)))
	 (end (+ start length)))
    (assert (plusp length))
    (goto-char (byte-to-position start))
    (prog1 (read (current-buffer))
      (delete-region (- (byte-to-position start) 6)
		     (byte-to-position end)))))

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
  2015-08-18  9:11 how to calculate the size of string in bytes? Sam Halliday
@ 2015-08-18 10:13 ` tomas
  2015-08-18 14:37   ` Eli Zaretskii
                     ` (2 more replies)
       [not found] ` <mailman.8504.1439892841.904.help-gnu-emacs@gnu.org>
  2015-08-18 14:34 ` Eli Zaretskii
  2 siblings, 3 replies; 19+ messages in thread
From: tomas @ 2015-08-18 10:13 UTC (permalink / raw)
  To: Sam Halliday; +Cc: help-gnu-emacs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Aug 18, 2015 at 02:11:54AM -0700, Sam Halliday wrote:
> Hi all,
> 
> We've had to change the ENSIME protocol to be more friendly to other editors and this has meant changing how we frame TCP messages.
> 
> We used to have a 6 character hex number at the start of each message that counted the number of multibyte characters, but we'd like to change it to be the number of bytes in the message.
> 
> We're sending the string to `process-send-string' and `read'ing from the associated network buffer. But when calculating the outgoing length of the string that we want to send, we use `length' --- but we need this to be `length-in-bytes' not the number of multibyte chars. Is there a built in function to do this or am I going to have to iterate the string and count the byte size of each character?
> 
> A quick test shows that
> 
>   (length (encode-coding-string "EURO" 'raw-text))
> 
> seems to give the correct result (1 for ASCII, 2 for Pound Sterling, 3 for Euro), but I am not 100% sure if this is correct.

Raw is, afaik, Emacs's internal coding system. You don't want traces of it
in the network :-)

I'd expect you to use whichever coding system the network protocol prescribes
(these days it'd be UTF-8 by default). Things will (mostly) work for raw-text
since it's nearly UTF-8.

The really correct way to do this (AFAICS) would be to find out which encoding
process-send-string is going to use (via process-coding-system) and use *that*
in the length calculation -- this way you won't lie :-)

So I'd try this (slightly reordering the let*)

  (let* ((msg (concat (ensime-prin1-to-string sexp) "\n"))
         (coding-system (cdr (process-coding-system proc)))
         (string (concat (ensime-net-encode-length (length encode-coding-string msg coding-system)) msg))
    ...


It seems somewhat wasteful to encode msg (to find its length) just
to let process-send-string encode again -- perhaps there's a better
idiom around for that. The use case seems common enough. Anyone?

regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlXTBWAACgkQBcgs9XrR2kYjzACfVd/+R0wNKqWVt5sXxX/9WVj2
OjQAnRRuUdorjnIjd+tpL4z7frx1JGYZ
=yjMt
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
       [not found] ` <mailman.8504.1439892841.904.help-gnu-emacs@gnu.org>
@ 2015-08-18 10:43   ` Sam Halliday
  2015-08-18 11:47     ` tomas
       [not found]     ` <mailman.8510.1439898432.904.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 19+ messages in thread
From: Sam Halliday @ 2015-08-18 10:43 UTC (permalink / raw)
  To: help-gnu-emacs

On Tuesday, 18 August 2015 11:14:04 UTC+1, to...@tuxteam.de  wrote:
> On Tue, Aug 18, 2015 at 02:11:54AM -0700, Sam Halliday wrote:
> > We used to have a 6 character hex number at the start of each message that counted the number of multibyte characters, but we'd like to change it to be the number of bytes in the message.
> > 
> > We're sending the string to `process-send-string' and `read'ing from the associated network buffer. But when calculating the outgoing length of the string that we want to send, we use `length' --- but we need this to be `length-in-bytes' not the number of multibyte chars. Is there a built in function to do this or am I going to have to iterate the string and count the byte size of each character?
> > 
> > A quick test shows that
> > 
> >   (length (encode-coding-string "EURO" 'raw-text))
> > 
> > seems to give the correct result (1 for ASCII, 2 for Pound Sterling, 3 for Euro), but I am not 100% sure if this is correct.
> 
> Raw is, afaik, Emacs's internal coding system. You don't want traces of it
> in the network :-)


We're not sending the message using raw, we're using UTF-8. But I need to calculate the length of the UTF-8 string IN BYTES as part of the payload (each messages begins with a 6 character hex encoding of the proceeding string's raw length).

I'm using "raw" to calculate an approximation of the UTF-8 string's byte length, but I am aware that it might not actually be true in the general case :-/

I don't think what you've suggested would actually change the semantics, but it would allow us to use a different encoding on the wire than the encoding of the string. We don't really need to worry about that at this stage, because all our users are using UTF-8. We'll keep it in mind though.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
  2015-08-18 10:43   ` Sam Halliday
@ 2015-08-18 11:47     ` tomas
       [not found]     ` <mailman.8510.1439898432.904.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 19+ messages in thread
From: tomas @ 2015-08-18 11:47 UTC (permalink / raw)
  To: Sam Halliday; +Cc: help-gnu-emacs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Aug 18, 2015 at 03:43:44AM -0700, Sam Halliday wrote:
> On Tuesday, 18 August 2015 11:14:04 UTC+1, to...@tuxteam.de  wrote:
> > On Tue, Aug 18, 2015 at 02:11:54AM -0700, Sam Halliday wrote:
> > > We used to have a 6 character hex number at the start of each message that counted the number of multibyte characters, but we'd like to change it to be the number of bytes in the message.
> > > 
> > > We're sending the string to `process-send-string' and `read'ing from the associated network buffer. But when calculating the outgoing length of the string that we want to send, we use `length' --- but we need this to be `length-in-bytes' not the number of multibyte chars. Is there a built in function to do this or am I going to have to iterate the string and count the byte size of each character?
> > > 
> > > A quick test shows that
> > > 
> > >   (length (encode-coding-string "EURO" 'raw-text))
> > > 
> > > seems to give the correct result (1 for ASCII, 2 for Pound Sterling, 3 for Euro), but I am not 100% sure if this is correct.
> > 
> > Raw is, afaik, Emacs's internal coding system. You don't want traces of it
> > in the network :-)
> 
> 
> We're not sending the message using raw, we're using UTF-8. But I need to calculate the length of the UTF-8 string IN BYTES as part of the payload (each messages begins with a 6 character hex encoding of the proceeding string's raw length).

Yes, I get that. The way I understand encode-coding-string is that you give
it the target encoding:

  (length (encode-coding-string foo 'raw-text))

would mean "transform this string to whatever Emacs uses as internal
encoding and measure its length in bytes", whereas what you want is,
AFAIU "transform this string to UTF-8 and measure its length in bytes",
which would read as:

  (length (encode-coding-string foo 'utf-8))

> I'm using "raw" to calculate an approximation of the UTF-8 string's byte length, but I am aware that it might not actually be true in the general case :-/

Use utf-8 then?

> I don't think what you've suggested would actually change the semantics, but it would allow us to use a different encoding on the wire than the encoding of the string. We don't really need to worry about that at this stage, because all our users are using UTF-8. We'll keep it in mind though.

But, but... isn't that a bug lurking? And it would be so easy to fix...
(that is unrelated to the above issue -- that I think you want utf-8
instead of raw)

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlXTGzcACgkQBcgs9XrR2kbq/wCggTBpkebxoL9wIXzoFcSBZDAq
RqQAmwTy3yopi8MdM3r1xn9iQDXYRYWa
=ISij
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
       [not found]     ` <mailman.8510.1439898432.904.help-gnu-emacs@gnu.org>
@ 2015-08-18 12:06       ` Sam Halliday
  0 siblings, 0 replies; 19+ messages in thread
From: Sam Halliday @ 2015-08-18 12:06 UTC (permalink / raw)
  To: help-gnu-emacs

On Tuesday, 18 August 2015 12:47:15 UTC+1, to...@tuxteam.de  wrote:
> > We're not sending the message using raw, we're using UTF-8. But I need to calculate the length of the UTF-8 string IN BYTES as part of the payload (each messages begins with a 6 character hex encoding of the proceeding string's raw length).
> 
> Yes, I get that. The way I understand encode-coding-string is that you give
> it the target encoding:
> 
>   (length (encode-coding-string foo 'raw-text))


Aah, ok, I didn't get what you were saying. I thought `utf-8' here would just give me back the original. OK, so I really need

  (length (encode-coding-string "EURO" 'utf-8))

and actually, since the process can be using a different encoding, I need

  (length (encode-coding-string "EURO" my-encoding))

Thanks! I already pushed a quick fix, but this seems more solid.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
  2015-08-18  9:11 how to calculate the size of string in bytes? Sam Halliday
  2015-08-18 10:13 ` tomas
       [not found] ` <mailman.8504.1439892841.904.help-gnu-emacs@gnu.org>
@ 2015-08-18 14:34 ` Eli Zaretskii
  2 siblings, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2015-08-18 14:34 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Tue, 18 Aug 2015 02:11:54 -0700 (PDT)
> From: Sam Halliday <sam.halliday@gmail.com>
> 
> Hi all,
> 
> We've had to change the ENSIME protocol to be more friendly to other editors and this has meant changing how we frame TCP messages.
> 
> We used to have a 6 character hex number at the start of each message that counted the number of multibyte characters, but we'd like to change it to be the number of bytes in the message.
> 
> We're sending the string to `process-send-string' and `read'ing from the associated network buffer. But when calculating the outgoing length of the string that we want to send, we use `length' --- but we need this to be `length-in-bytes' not the number of multibyte chars. Is there a built in function to do this or am I going to have to iterate the string and count the byte size of each character?

Emacs 25 has bufferpos-to-filepos, which I think does what you want.

> A quick test shows that
> 
>   (length (encode-coding-string "EURO" 'raw-text))
> 
> seems to give the correct result (1 for ASCII, 2 for Pound Sterling, 3 for Euro), but I am not 100% sure if this is correct.

It will fail if the string includes some exotic characters or raw
bytes.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
  2015-08-18 10:13 ` tomas
@ 2015-08-18 14:37   ` Eli Zaretskii
  2015-08-18 14:45     ` tomas
  2015-08-18 21:47   ` Stefan Monnier
       [not found]   ` <mailman.8577.1439934462.904.help-gnu-emacs@gnu.org>
  2 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2015-08-18 14:37 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Tue, 18 Aug 2015 12:13:52 +0200
> From: <tomas@tuxteam.de>
> Cc: help-gnu-emacs@gnu.org
> 
> Raw is, afaik, Emacs's internal coding system.

Almost, with the exception of raw bytes and characters from un-unified
CJK charsets.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
  2015-08-18 14:37   ` Eli Zaretskii
@ 2015-08-18 14:45     ` tomas
  2015-08-18 15:00       ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: tomas @ 2015-08-18 14:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Aug 18, 2015 at 05:37:11PM +0300, Eli Zaretskii wrote:
> > Date: Tue, 18 Aug 2015 12:13:52 +0200
> > From: <tomas@tuxteam.de>
> > Cc: help-gnu-emacs@gnu.org
> > 
> > Raw is, afaik, Emacs's internal coding system.
> 
> Almost, with the exception of raw bytes and characters from un-unified
> CJK charsets.

Right, those get mapped to something non-UTF-8. Thanks for the clarification

Perhaps you know that off-hand, but I can look it up/try it out: probably
encode-coding-string and process-send-string would both fail, if the target
coding system is set to UTF-8?

regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlXTRQoACgkQBcgs9XrR2kbuvACdGZH9gt7pKKD8kYedVDstH6yk
o9kAn1Y28MywYTJZGn52s121SyOUo57C
=EYVr
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
  2015-08-18 14:45     ` tomas
@ 2015-08-18 15:00       ` Eli Zaretskii
  2015-08-18 16:01         ` tomas
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2015-08-18 15:00 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Tue, 18 Aug 2015 16:45:30 +0200
> Cc: help-gnu-emacs@gnu.org
> From:  <tomas@tuxteam.de>
> 
> Perhaps you know that off-hand, but I can look it up/try it out: probably
> encode-coding-string and process-send-string would both fail, if the target
> coding system is set to UTF-8?

Why should it fail?  In which use case?  I'm probably missing
something.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
  2015-08-18 15:00       ` Eli Zaretskii
@ 2015-08-18 16:01         ` tomas
  2015-08-18 16:35           ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: tomas @ 2015-08-18 16:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Aug 18, 2015 at 06:00:02PM +0300, Eli Zaretskii wrote:
> > Date: Tue, 18 Aug 2015 16:45:30 +0200
> > Cc: help-gnu-emacs@gnu.org
> > From:  <tomas@tuxteam.de>
> > 
> > Perhaps you know that off-hand, but I can look it up/try it out: probably
> > encode-coding-string and process-send-string would both fail, if the target
> > coding system is set to UTF-8?
> 
> Why should it fail?  In which use case?  I'm probably missing
> something.

Perhaps I should make my homework better before making stupid questions :-)

I was thinking of "characters not expressible in UTF-8". Does Emacs have
those?

regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlXTVukACgkQBcgs9XrR2kYLUACeNqxXdwZHjA/e/slUThyeS9KU
JqsAni4tgUKj8QcX6PENuQYNsg4lmefS
=LWIU
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
  2015-08-18 16:01         ` tomas
@ 2015-08-18 16:35           ` Eli Zaretskii
  2015-08-18 19:30             ` tomas
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2015-08-18 16:35 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Tue, 18 Aug 2015 18:01:45 +0200
> Cc: help-gnu-emacs@gnu.org
> From:  <tomas@tuxteam.de>
> 
> I was thinking of "characters not expressible in UTF-8". Does Emacs have
> those?

Raw bytes come out as themselves (which might be invalid UTF-8), but
that's not a failure, that's the user's fault, because they had those
bytes in the buffer to begin with.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
  2015-08-18 16:35           ` Eli Zaretskii
@ 2015-08-18 19:30             ` tomas
  2015-08-18 19:49               ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: tomas @ 2015-08-18 19:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Aug 18, 2015 at 07:35:03PM +0300, Eli Zaretskii wrote:
> > Date: Tue, 18 Aug 2015 18:01:45 +0200
> > Cc: help-gnu-emacs@gnu.org
> > From:  <tomas@tuxteam.de>
> > 
> > I was thinking of "characters not expressible in UTF-8". Does Emacs have
> > those?
> 
> Raw bytes come out as themselves (which might be invalid UTF-8), but
> that's not a failure, that's the user's fault, because they had those
> bytes in the buffer to begin with.

I was having difficulties in understanding you, so I tried it out. Now
I understand: Emacs's internal (raw) coding system can represent "characters
not expressible in utf-8". The function encode-coding-string passes those
bytes silently through, outputting an invalid utf-8 sequence.

So I venture the guess that when the Emacs buffer contains something
epressible as valid utf-8, 'utf-8 and 'raw are equivalent (what about
combining characters?)

Thanks for the insights
- -- t
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlXTh+kACgkQBcgs9XrR2kZH2QCcDjlnu5BP0UxHnBweCdE9revf
sYoAn0fwO/WeoGirGfLlqA3lH1Cp9Bco
=IAVl
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
  2015-08-18 19:30             ` tomas
@ 2015-08-18 19:49               ` Eli Zaretskii
  2015-08-18 20:11                 ` tomas
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2015-08-18 19:49 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Tue, 18 Aug 2015 21:30:49 +0200
> Cc: help-gnu-emacs@gnu.org
> From:  <tomas@tuxteam.de>
> 
> I was having difficulties in understanding you

Sorry about that.  It's a complex issue to explain in a few words.

> Now I understand: Emacs's internal (raw) coding system can represent
> "characters not expressible in utf-8".

More accurately, it can represent characters outside the Unicode code
space.

And please don't call that "raw"; the internal representation of
characters used by Emacs is known as 'utf-8-emacs'.

> The function encode-coding-string passes those bytes silently
> through, outputting an invalid utf-8 sequence.

Yes.  Although in interactive functions Emacs will normally complain
and ask for a better encoding.

> So I venture the guess that when the Emacs buffer contains something
> epressible as valid utf-8, 'utf-8 and 'raw are equivalent

Yes.

> (what about combining characters?)

Emacs doesn't normalize/compose/decompose characters when it encodes
text (with a notable exception of the utf-8-hfs encoding).
Applications that want this should do that themselves, e.g. using the
facilities in ucs-normalize.el.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
  2015-08-18 19:49               ` Eli Zaretskii
@ 2015-08-18 20:11                 ` tomas
  0 siblings, 0 replies; 19+ messages in thread
From: tomas @ 2015-08-18 20:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Aug 18, 2015 at 10:49:58PM +0300, Eli Zaretskii wrote:
> > Date: Tue, 18 Aug 2015 21:30:49 +0200
> > Cc: help-gnu-emacs@gnu.org
> > From:  <tomas@tuxteam.de>
> > 
> > I was having difficulties in understanding you
> 
> Sorry about that.  It's a complex issue to explain in a few words.

No need to be sorry. The fault's on me -- once I did my homework
things improved :-)

Thanks for your patience: very much appreciated.

> > Now I understand: Emacs's internal (raw) coding system can represent
> > "characters not expressible in utf-8".
> 
> More accurately, it can represent characters outside the Unicode code
> space.
> 
> And please don't call that "raw"; the internal representation of
> characters used by Emacs is known as 'utf-8-emacs'.

Ah, OK. Point taken.

> > The function encode-coding-string passes those bytes silently
> > through, outputting an invalid utf-8 sequence.
> 
> Yes.  Although in interactive functions Emacs will normally complain
> and ask for a better encoding.

Understood

> > So I venture the guess that when the Emacs buffer contains something
> > epressible as valid utf-8, 'utf-8 and 'raw are equivalent
> 
> Yes.
> 
> > (what about combining characters?)
> 
> Emacs doesn't normalize/compose/decompose characters when it encodes
> text (with a notable exception of the utf-8-hfs encoding).
> Applications that want this should do that themselves, e.g. using the
> facilities in ucs-normalize.el.

Thanks: I learned quite a bit now :-)

regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEUEARECAAYFAlXTkWYACgkQBcgs9XrR2kaQbwCggSK12zVBjHiFowFVsddq36SJ
XmAAmON/V8XcGaUfjxW1llhEavSqcp0=
=fYz9
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
  2015-08-18 10:13 ` tomas
  2015-08-18 14:37   ` Eli Zaretskii
@ 2015-08-18 21:47   ` Stefan Monnier
  2015-08-19  5:43     ` tomas
       [not found]   ` <mailman.8577.1439934462.904.help-gnu-emacs@gnu.org>
  2 siblings, 1 reply; 19+ messages in thread
From: Stefan Monnier @ 2015-08-18 21:47 UTC (permalink / raw)
  To: help-gnu-emacs

> It seems somewhat wasteful to encode msg (to find its length) just
> to let process-send-string encode again -- perhaps there's a better
> idiom around for that.

Yup: communicate with the process using bytes rather than chars!

I.e. set the process's coding system to binary.

Then you just need to call (encode-coding-string msg coding-system) once
to get the bytes and you send them as is: they won't be re-encoded.


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
  2015-08-18 21:47   ` Stefan Monnier
@ 2015-08-19  5:43     ` tomas
  0 siblings, 0 replies; 19+ messages in thread
From: tomas @ 2015-08-19  5:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: help-gnu-emacs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Aug 18, 2015 at 05:47:27PM -0400, Stefan Monnier wrote:
> > It seems somewhat wasteful to encode msg (to find its length) just
> > to let process-send-string encode again -- perhaps there's a better
> > idiom around for that.
> 
> Yup: communicate with the process using bytes rather than chars!
> 
> I.e. set the process's coding system to binary.
> 
> Then you just need to call (encode-coding-string msg coding-system) once
> to get the bytes and you send them as is: they won't be re-encoded.

(pats forehead) Of course! Thanks, Stefan.

regards
- -- t
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlXUF5MACgkQBcgs9XrR2kYb6ACfakO/BHVsih4M7IPDxJfotIPD
I8kAnRYDmQF6VAnzXncPvMSjJjAOLXXS
=h0oY
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
       [not found]   ` <mailman.8577.1439934462.904.help-gnu-emacs@gnu.org>
@ 2015-08-19  8:57     ` Sam Halliday
  2015-08-19  9:22       ` Sam Halliday
  2015-08-19 19:47       ` Stefan Monnier
  0 siblings, 2 replies; 19+ messages in thread
From: Sam Halliday @ 2015-08-19  8:57 UTC (permalink / raw)
  To: help-gnu-emacs

On Tuesday, 18 August 2015 22:47:44 UTC+1, Stefan Monnier  wrote:
> > It seems somewhat wasteful to encode msg (to find its length) just
> > to let process-send-string encode again -- perhaps there's a better
> > idiom around for that.
> 
> Yup: communicate with the process using bytes rather than chars!
> 
> I.e. set the process's coding system to binary.
> 
> Then you just need to call (encode-coding-string msg coding-system) once
> to get the bytes and you send them as is: they won't be re-encoded.
> 
> 
>         Stefan

Heh, that's actually a very good suggestion. We'll keep that in mind if this is ever a performance bottleneck. We're hoping to move the ENSIME protocol (based on SWANK) over to S-Expressions over WebSockets which will mean we can just delete all this code.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
  2015-08-19  8:57     ` Sam Halliday
@ 2015-08-19  9:22       ` Sam Halliday
  2015-08-19 19:47       ` Stefan Monnier
  1 sibling, 0 replies; 19+ messages in thread
From: Sam Halliday @ 2015-08-19  9:22 UTC (permalink / raw)
  To: help-gnu-emacs

Actually, one question Stefan.

An advantage of the string encodings is that we're pretty confident that a newline will flush the network buffer. How do we make sure that a binary encoding will do the same? (or is there no buffering and we're worrying about nothing)


On Wednesday, 19 August 2015 09:57:38 UTC+1, Sam Halliday  wrote:
> On Tuesday, 18 August 2015 22:47:44 UTC+1, Stefan Monnier  wrote:
> > > It seems somewhat wasteful to encode msg (to find its length) just
> > > to let process-send-string encode again -- perhaps there's a better
> > > idiom around for that.
> > 
> > Yup: communicate with the process using bytes rather than chars!
> > 
> > I.e. set the process's coding system to binary.
> > 
> > Then you just need to call (encode-coding-string msg coding-system) once
> > to get the bytes and you send them as is: they won't be re-encoded.
> > 
> > 
> >         Stefan
> 
> Heh, that's actually a very good suggestion. We'll keep that in mind if this is ever a performance bottleneck. We're hoping to move the ENSIME protocol (based on SWANK) over to S-Expressions over WebSockets which will mean we can just delete all this code.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: how to calculate the size of string in bytes?
  2015-08-19  8:57     ` Sam Halliday
  2015-08-19  9:22       ` Sam Halliday
@ 2015-08-19 19:47       ` Stefan Monnier
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Monnier @ 2015-08-19 19:47 UTC (permalink / raw)
  To: help-gnu-emacs

> Heh, that's actually a very good suggestion. We'll keep that in mind if this
> is ever a performance bottleneck.

Actually, I recommend it for sanity reasons rather than
performance reasons.
It'll help you make sure the right encoding is used for the right data,
and the "counts" do count the right elements as well.


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-08-19 19:47 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-18  9:11 how to calculate the size of string in bytes? Sam Halliday
2015-08-18 10:13 ` tomas
2015-08-18 14:37   ` Eli Zaretskii
2015-08-18 14:45     ` tomas
2015-08-18 15:00       ` Eli Zaretskii
2015-08-18 16:01         ` tomas
2015-08-18 16:35           ` Eli Zaretskii
2015-08-18 19:30             ` tomas
2015-08-18 19:49               ` Eli Zaretskii
2015-08-18 20:11                 ` tomas
2015-08-18 21:47   ` Stefan Monnier
2015-08-19  5:43     ` tomas
     [not found]   ` <mailman.8577.1439934462.904.help-gnu-emacs@gnu.org>
2015-08-19  8:57     ` Sam Halliday
2015-08-19  9:22       ` Sam Halliday
2015-08-19 19:47       ` Stefan Monnier
     [not found] ` <mailman.8504.1439892841.904.help-gnu-emacs@gnu.org>
2015-08-18 10:43   ` Sam Halliday
2015-08-18 11:47     ` tomas
     [not found]     ` <mailman.8510.1439898432.904.help-gnu-emacs@gnu.org>
2015-08-18 12:06       ` Sam Halliday
2015-08-18 14:34 ` Eli Zaretskii

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.