From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Chris Dennis Newsgroups: gmane.lisp.guile.user Subject: Re: Guile regular expressions are too greedy Date: Wed, 22 Jul 2009 14:12:19 +0100 Message-ID: <4A671033.6010503@btinternet.com> References: <4A66E467.2060709@btinternet.com> <87zlawj4cd.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1248268367 17205 80.91.229.12 (22 Jul 2009 13:12:47 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 22 Jul 2009 13:12:47 +0000 (UTC) Cc: guile-user@gnu.org To: =?ISO-8859-1?Q?Ludovic_Court=E8s?= Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Wed Jul 22 15:12:39 2009 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MTbca-0005wn-5Z for guile-user@m.gmane.org; Wed, 22 Jul 2009 15:12:37 +0200 Original-Received: from localhost ([127.0.0.1]:44422 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MTbcZ-0007yE-0V for guile-user@m.gmane.org; Wed, 22 Jul 2009 09:12:35 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MTbcS-0007vF-DY for guile-user@gnu.org; Wed, 22 Jul 2009 09:12:28 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MTbcN-0007jH-3B for guile-user@gnu.org; Wed, 22 Jul 2009 09:12:27 -0400 Original-Received: from [199.232.76.173] (port=38833 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MTbcM-0007iq-UB for guile-user@gnu.org; Wed, 22 Jul 2009 09:12:22 -0400 Original-Received: from smtp809.mail.ird.yahoo.com ([217.146.188.69]:41942) by monty-python.gnu.org with smtp (Exim 4.60) (envelope-from ) id 1MTbcL-0003Uj-Mb for guile-user@gnu.org; Wed, 22 Jul 2009 09:12:22 -0400 Original-Received: (qmail 57872 invoked from network); 22 Jul 2009 13:12:20 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=btinternet.com; h=Received:X-Yahoo-SMTP:X-YMail-OSG:X-Yahoo-Newman-Property:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=QdO+XILM7oa/kZsykhKXwx31sNzzPzrxHtixJrTSQkd2ytCUFmg9yWSqzxYrHs3WDutU9lWaMtu/MaXmO/MVUdomtqpRRGP/IwM3ymR3uv+fhkomsfUDDNwgFVLO1k/DX+O9mIoDy458qTpOv6AN2ZKZzcNkDCjnOG3EgOYdTGs= ; Original-Received: from unknown (HELO ?192.168.1.3?) (cgdennis@78.32.16.193 with plain) by smtp809.mail.ird.yahoo.com with SMTP; 22 Jul 2009 13:12:20 -0000 X-Yahoo-SMTP: HmaTfeyswBDGfPXlq_JtxoxwH5.3FMk49vfTMhfMJTiOLKkW X-YMail-OSG: Y4oagSkVM1npY6zjVZPOdrohLrmtRNB_FWU29whulCorqrcjGDDhdCxWXkTkoikImBbefC68IVl7bz_R13kD0BPzEUmrzIOvTVL7HNR8m0qBCqW3OTccfoAIblDUszyHLxxkR1i5CPLosZq04O1aLkbMRb3h9Uydd1pV82higgz2s2L.l1KrltkRkJ_uc2tmxM2QT0y7gPl7OC_lUAM4i0gwaUUhWH6.V2QUH2tod88jqo6WpnjqKR9EA3nEjnboUV7dVqnF0r_dCT6UaigtM2HZwHk_8KGxo1FvKFHBDzInDcZ4k6CNnCiSWFVmwZIbUXZJm1oKy6aVW8y6ed9o9kmB.hAnrnAIWBZK8CzZ7qbn X-Yahoo-Newman-Property: ymail-3 User-Agent: Thunderbird 2.0.0.22 (X11/20090608) In-Reply-To: <87zlawj4cd.fsf@gnu.org> X-detected-operating-system: by monty-python.gnu.org: FreeBSD 6.x (1) X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:7368 Archived-At: Ludovic Courtès wrote: > Hello, > > Chris Dennis writes: > >> Hello Guile People >> >> Is there a way to make Guile regular expressions less greedy? I >> understand that POSIX doesn't define non-greedy modifiers. >> >> Specifically, I'm trying to parse font names such as >> >> Arial 12 >> Arial Bold Italic 14 >> Nimbus Sans L Bold Italic Condensed 11 >> >> so that I can construct CSS styles from them. >> >> I've tried the following, but the first (.*) gobbles up everything >> before the size because the other elements are optional: >> >> (define s (string-match "(.*)( +(bold|semi-bold|regular|light))?( >> +(italic|oblique))?( +(condensed))? +([0-9]+)" "nimbus sans l bold >> italic condensed 11")) > > Here's a possible solution: > > --8<---------------cut here---------------start------------->8--- > scheme@(guile-user)> (define rx (make-regexp "^([^ ]+)( +(bold|semi-bold|regular|light))?( +(italic|oblique))?( +(condensed))? +([0-9]+)" regexp/extended)) > scheme@(guile-user)> (define m (regexp-exec rx "nimbus bold italic condensed 11")) > scheme@(guile-user)> (match:substring m 1) > "nimbus" > scheme@(guile-user)> (match:substring m 2) > " bold" > scheme@(guile-user)> (match:substring m 3) > "bold" > scheme@(guile-user)> (match:substring m 4) > " italic" > scheme@(guile-user)> (match:substring m 5) > "italic" > --8<---------------cut here---------------end--------------->8--- > > Note that I slightly modified the string that's matched because "sans" > and "l" are not meant to be matched. > > Hope this helps, > Ludo'. Thanks for the reply. Unfortunately the name of the font really is "Nimbus Sans L" -- it's the fact that font names can contain spaces that causes the problem, and means that I can't use ([^ ]+) to match the font name. cheers Chris -- Chris Dennis cgdennis@btinternet.com Fordingbridge, Hampshire, UK