From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Sam Halliday Newsgroups: gmane.emacs.help Subject: Re: how to calculate the size of string in bytes? Date: Tue, 18 Aug 2015 03:43:44 -0700 (PDT) Message-ID: References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1439894753 6847 80.91.229.3 (18 Aug 2015 10:45:53 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 18 Aug 2015 10:45:53 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Aug 18 12:45:38 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZReOg-0001mI-40 for geh-help-gnu-emacs@m.gmane.org; Tue, 18 Aug 2015 12:45:38 +0200 Original-Received: from localhost ([::1]:55645 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZReOe-00080l-7Z for geh-help-gnu-emacs@m.gmane.org; Tue, 18 Aug 2015 06:45:36 -0400 X-Received: by 10.50.102.37 with SMTP id fl5mr3173010igb.10.1439894624508; Tue, 18 Aug 2015 03:43:44 -0700 (PDT) X-Received: by 10.140.95.79 with SMTP id h73mr75926qge.30.1439894624479; Tue, 18 Aug 2015 03:43:44 -0700 (PDT) Original-Path: usenet.stanford.edu!x6no1698833igd.0!news-out.google.com!78ni15607qge.1!nntp.google.com!y105no1329161qge.1!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Original-Newsgroups: gnu.emacs.help In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=83.244.128.25; posting-account=kRukCAoAAAANs-vsVh9dFwo5kp5pwnPz Original-NNTP-Posting-Host: 83.244.128.25 User-Agent: G2/1.0 Injection-Date: Tue, 18 Aug 2015 10:43:44 +0000 Original-Xref: usenet.stanford.edu gnu.emacs.help:214383 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:106660 Archived-At: On Tuesday, 18 August 2015 11:14:04 UTC+1, to...@tuxteam.de wrote: > On Tue, Aug 18, 2015 at 02:11:54AM -0700, Sam Halliday wrote: > > We used to have a 6 character hex number at the start of each message t= hat counted the number of multibyte characters, but we'd like to change it = to be the number of bytes in the message. > >=20 > > We're sending the string to `process-send-string' and `read'ing from th= e associated network buffer. But when calculating the outgoing length of th= e string that we want to send, we use `length' --- but we need this to be `= length-in-bytes' not the number of multibyte chars. Is there a built in fun= ction to do this or am I going to have to iterate the string and count the = byte size of each character? > >=20 > > A quick test shows that > >=20 > > (length (encode-coding-string "EURO" 'raw-text)) > >=20 > > seems to give the correct result (1 for ASCII, 2 for Pound Sterling, 3 = for Euro), but I am not 100% sure if this is correct. >=20 > Raw is, afaik, Emacs's internal coding system. You don't want traces of i= t > in the network :-) We're not sending the message using raw, we're using UTF-8. But I need to c= alculate the length of the UTF-8 string IN BYTES as part of the payload (ea= ch messages begins with a 6 character hex encoding of the proceeding string= 's raw length). I'm using "raw" to calculate an approximation of the UTF-8 string's byte le= ngth, but I am aware that it might not actually be true in the general case= :-/ I don't think what you've suggested would actually change the semantics, bu= t it would allow us to use a different encoding on the wire than the encodi= ng of the string. We don't really need to worry about that at this stage, b= ecause all our users are using UTF-8. We'll keep it in mind though.