From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Newsgroups: gmane.emacs.help Subject: Re: how to calculate the size of string in bytes? Date: Tue, 18 Aug 2015 12:13:52 +0200 Message-ID: <20150818101352.GA6744@tuxteam.de> References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; x-action=pgp-signed Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1439892858 8399 80.91.229.3 (18 Aug 2015 10:14:18 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 18 Aug 2015 10:14:18 +0000 (UTC) Cc: help-gnu-emacs@gnu.org To: Sam Halliday Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Aug 18 12:14:12 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZRduG-0005qT-8J for geh-help-gnu-emacs@m.gmane.org; Tue, 18 Aug 2015 12:14:12 +0200 Original-Received: from localhost ([::1]:55523 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZRduF-0005vZ-FS for geh-help-gnu-emacs@m.gmane.org; Tue, 18 Aug 2015 06:14:11 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:41583) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZRdu3-0005vE-DO for help-gnu-emacs@gnu.org; Tue, 18 Aug 2015 06:14:00 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZRdty-0002gd-69 for help-gnu-emacs@gnu.org; Tue, 18 Aug 2015 06:13:59 -0400 Original-Received: from mail.tuxteam.de ([5.199.139.25]:34962 helo=tomasium.tuxteam.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZRdty-0002gW-0B for help-gnu-emacs@gnu.org; Tue, 18 Aug 2015 06:13:54 -0400 Original-Received: from tomas by tomasium.tuxteam.de with local (Exim 4.80) (envelope-from ) id 1ZRdtw-0001vu-NR; Tue, 18 Aug 2015 12:13:52 +0200 In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 5.199.139.25 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:106658 Archived-At: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, Aug 18, 2015 at 02:11:54AM -0700, Sam Halliday wrote: > Hi all, > > We've had to change the ENSIME protocol to be more friendly to other editors and this has meant changing how we frame TCP messages. > > We used to have a 6 character hex number at the start of each message that counted the number of multibyte characters, but we'd like to change it to be the number of bytes in the message. > > We're sending the string to `process-send-string' and `read'ing from the associated network buffer. But when calculating the outgoing length of the string that we want to send, we use `length' --- but we need this to be `length-in-bytes' not the number of multibyte chars. Is there a built in function to do this or am I going to have to iterate the string and count the byte size of each character? > > A quick test shows that > > (length (encode-coding-string "EURO" 'raw-text)) > > seems to give the correct result (1 for ASCII, 2 for Pound Sterling, 3 for Euro), but I am not 100% sure if this is correct. Raw is, afaik, Emacs's internal coding system. You don't want traces of it in the network :-) I'd expect you to use whichever coding system the network protocol prescribes (these days it'd be UTF-8 by default). Things will (mostly) work for raw-text since it's nearly UTF-8. The really correct way to do this (AFAICS) would be to find out which encoding process-send-string is going to use (via process-coding-system) and use *that* in the length calculation -- this way you won't lie :-) So I'd try this (slightly reordering the let*) (let* ((msg (concat (ensime-prin1-to-string sexp) "\n")) (coding-system (cdr (process-coding-system proc))) (string (concat (ensime-net-encode-length (length encode-coding-string msg coding-system)) msg)) ... It seems somewhat wasteful to encode msg (to find its length) just to let process-send-string encode again -- perhaps there's a better idiom around for that. The use case seems common enough. Anyone? regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlXTBWAACgkQBcgs9XrR2kYjzACfVd/+R0wNKqWVt5sXxX/9WVj2 OjQAnRRuUdorjnIjd+tpL4z7frx1JGYZ =yjMt -----END PGP SIGNATURE-----