From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: phillip.lord@newcastle.ac.uk (Phillip Lord) Newsgroups: gmane.emacs.devel Subject: syntax identification (Request for Help) Date: Tue, 04 Aug 2015 17:49:23 +0100 Message-ID: <87h9ofdmng.fsf@newcastle.ac.uk> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1438709198 6988 80.91.229.3 (4 Aug 2015 17:26:38 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 4 Aug 2015 17:26:38 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Aug 04 19:26:33 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZMfyz-0003eM-6C for ged-emacs-devel@m.gmane.org; Tue, 04 Aug 2015 19:26:33 +0200 Original-Received: from localhost ([::1]:36536 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZMfyy-00067o-Fn for ged-emacs-devel@m.gmane.org; Tue, 04 Aug 2015 13:26:32 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:55229) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZMfyr-00067M-7S for emacs-devel@gnu.org; Tue, 04 Aug 2015 13:26:26 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZMfyo-0004Hz-1h for emacs-devel@gnu.org; Tue, 04 Aug 2015 13:26:25 -0400 Original-Received: from cheviot22.ncl.ac.uk ([128.240.234.22]:45215) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZMfyn-0004Hr-RP for emacs-devel@gnu.org; Tue, 04 Aug 2015 13:26:21 -0400 Original-Received: from smtpauth-vm.ncl.ac.uk ([10.8.233.129] helo=smtpauth.ncl.ac.uk) by cheviot22.ncl.ac.uk with esmtp (Exim 4.63) (envelope-from ) id 1ZMfP1-0005Ac-FB for emacs-devel@gnu.org; Tue, 04 Aug 2015 17:49:23 +0100 Original-Received: from jangai.ncl.ac.uk ([10.66.67.223] helo=localhost) by smtpauth.ncl.ac.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63) (envelope-from ) id 1ZMfP1-00038h-KG for emacs-devel@gnu.org; Tue, 04 Aug 2015 17:49:23 +0100 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 128.240.234.22 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:188401 Archived-At: I am trying to improve the syntax identification of omn-mode.el (in elpa). The omn syntax uses URLs everywhere which are identified like so: Within this syntax, although they are URLs they have little other meaning actually, they are IRIs -- identifiers, rather than locations. IRIs are difficult to identify by regular expression. So I treat them syntactically as strings with this (st is the syntax table). (modify-syntax-entry ?\< "|" st) (modify-syntax-entry ?\> "|" st) Strings are nice because I also do this.... (modify-syntax-entry ?\# "<" st) (modify-syntax-entry ?\n ">" st) that is # is the start of comment character but NOT inside a IRI where it's actually quite common. Identifying IRIs as strings also solves this problem since comment characters inside strings are not comment characters -- Emacs gives me this for free. This fails, however, in two ways. Firstly while is correctly identified so is url< and >url>. And, secondly "<" and ">" can also be used along to mean (guess what!) greater than or less than in an expression like so: xsd:integer[>= 0 , <= 18] Unfortunately, everthing between ">" and "<" gets identified as a string. Stefan added comments to omn-mode saying "We could use a syntax-propertize-function to do more carefully.". Would anyone be willing to help explain to me how this works and help me? I found the manual a bit confusing. I am willing to use space characters to differentiate. IRIs are complex (they have very few rules) but cannot contain spaces. The "facet" (i.e. [>= 0]) bit above can contain spaces, and while they do not need to contain spaces, I am willing to use this to differentiate between them and an IRI as an acceptable compromise. Any help gratefully recieved. Phil