From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Oleksandr Gavenko Newsgroups: gmane.emacs.help Subject: Re: Look for data serialisation format to implement communication between Emacs and external program. Date: Mon, 07 Jan 2013 15:53:57 +0200 Organization: Oleksandr Gavenko , http://gavenkoa.users.sf.net Message-ID: <87wqvprte2.fsf@gavenkoa.example.com> References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1357566872 844 80.91.229.3 (7 Jan 2013 13:54:32 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 7 Jan 2013 13:54:32 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Mon Jan 07 14:54:49 2013 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TsDA4-0000Vj-FT for geh-help-gnu-emacs@m.gmane.org; Mon, 07 Jan 2013 14:54:44 +0100 Original-Received: from localhost ([::1]:53474 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TsD9o-000675-W2 for geh-help-gnu-emacs@m.gmane.org; Mon, 07 Jan 2013 08:54:28 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:46062) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TsD9i-000669-BR for help-gnu-emacs@gnu.org; Mon, 07 Jan 2013 08:54:23 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TsD9h-0002kT-8w for help-gnu-emacs@gnu.org; Mon, 07 Jan 2013 08:54:22 -0500 Original-Received: from plane.gmane.org ([80.91.229.3]:33654) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TsD9h-0002kP-2g for help-gnu-emacs@gnu.org; Mon, 07 Jan 2013 08:54:21 -0500 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1TsD9q-0008M5-33 for help-gnu-emacs@gnu.org; Mon, 07 Jan 2013 14:54:30 +0100 Original-Received: from 37.229.4.200 ([37.229.4.200]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 07 Jan 2013 14:54:30 +0100 Original-Received: from gavenkoa by 37.229.4.200 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 07 Jan 2013 14:54:30 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 55 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 37.229.4.200 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) Cancel-Lock: sha1:9cPJXJDyEs/dc+iXxIPIv31YYV4= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:88501 Archived-At: On 2013-01-07, Helmut Eller wrote: > On Sun, Jan 06 2013, Oleksandr Gavenko wrote: > >> Is that right to use ASN.1 BER as serialisation data format for communication >> between Emacs and external program? > > S-expressions is the only format that Emacs can write and parse quickly > because the printer and reader are implemented in C. This is likely 10 > times faster than any parser that you write in Emacs Lisp. The downside > is that the external program needs to be able to do the same. Not such > a bad tradeoff as S-expressions are fairly easy to parse. > > For communication with an external format I recommend a "framed" format: > a frame is a fixed sized header followed by a variable length payload. > The header describes the length of the frame. The length should be in > bytes (not characters as counting characters in UTF8 strings is > uneccessary complicated). Knowing the length of the frame is very > useful because that makes it easy to wait for a complete frame. After > you received a complete frame, parsing is simpler because you don't have > to worry about incomplete input. > > I also recommend to limit the frame length to 24 bits (not 32 bit) > because Emacs fixnums are limited to 29 bits on 32 bit machines. > > The payload can then be an S-expression printed with the Emacs prin1 and > parsed back with the read function. The encoding of the payload can be > utf-8. But use the Emacs 'binary coding system for communication with > the external process and unibyte buffers for parsing. For the > binary-to-utf8 conversion of the payload use something like > decode-coding-string (which is written C and should be fast). > Seems that this is good solution in case of Emacs: (assoc ':title (read "((:type blog-entry) (:title \"Hello\") (:article \"world!\"))")) Data validation: (read ")") ;; ==> invalid-read-syntax or when assoc return unknown ":type", etc... Only things that annoying is escaping (like <div>hello</div> for
hello
in XML or in SLIP protocol where 0x7e escaped by 0x7d 0x5e and escape character 0x7d escaped by 0x7d 0x5d). > If you like, you can also use extra bits in the header to indicate the > format of the payload. E.g. it might be useful to have frames that > contain only plain strings (not encoded as S-expr). > I start from using custom TLV data format but parsing and validation is hand written so I decide as for suggestions... -- Best regards!