From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Thien-Thi Nguyen Newsgroups: gmane.lisp.guile.devel Subject: Re: Using libunistring for string comparisons et al Date: Fri, 18 Mar 2011 09:46:17 +0100 Message-ID: <87ei64u886.fsf@ambire.localdomain> References: <336042.33326.qm@web37901.mail.mud.yahoo.com> <878vwgmhah.fsf@netris.org> <511668.33680.qm@web37902.mail.mud.yahoo.com> <87sjuokniq.fsf@netris.org> <118142.11911.qm@web37907.mail.mud.yahoo.com> <87ipvjlvgj.fsf@netris.org> <87oc5b8fx3.fsf@gnu.org> <87tyf1kbae.fsf@netris.org> <877hbxwxjj.fsf@gnu.org> <87k4fxk4rx.fsf@netris.org> <87fwql5lvj.fsf@ambire.localdomain> <8762rhjjhn.fsf@netris.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1300438017 10964 80.91.229.12 (18 Mar 2011 08:46:57 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 18 Mar 2011 08:46:57 +0000 (UTC) Cc: Ludovic =?utf-8?Q?Court=C3=A8s?= , guile-devel@gnu.org To: Mark H Weaver Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Fri Mar 18 09:46:50 2011 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Q0VKY-0005re-G2 for guile-devel@m.gmane.org; Fri, 18 Mar 2011 09:46:46 +0100 Original-Received: from localhost ([127.0.0.1]:35528 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Q0VKX-0006cL-Iu for guile-devel@m.gmane.org; Fri, 18 Mar 2011 04:46:45 -0400 Original-Received: from [140.186.70.92] (port=52883 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Q0VKU-0006bz-2B for guile-devel@gnu.org; Fri, 18 Mar 2011 04:46:43 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Q0VKS-0005Ji-M9 for guile-devel@gnu.org; Fri, 18 Mar 2011 04:46:41 -0400 Original-Received: from smtp204.alice.it ([82.57.200.100]:41363) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Q0VKR-0005JJ-7p; Fri, 18 Mar 2011 04:46:39 -0400 Original-Received: from ambire.localdomain (95.245.66.158) by smtp204.alice.it (8.5.124.08) id 4D7E0F64005DE23C; Fri, 18 Mar 2011 09:46:24 +0100 Original-Received: from ttn by ambire.localdomain with local (Exim 4.69) (envelope-from ) id 1Q0VK5-00013B-Mn; Fri, 18 Mar 2011 09:46:17 +0100 In-Reply-To: <8762rhjjhn.fsf@netris.org> (Mark H. Weaver's message of "Thu, 17 Mar 2011 21:38:28 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 82.57.200.100 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:11901 Archived-At: () Mark H Weaver () Thu, 17 Mar 2011 21:38:28 -0400 If we may assume that the searched string is valid UTF-8, and when only ASCII characters are excluded (e.g. "."), then three additional states are required in the generated DFA. Let us call them S1, S2, and S3. [handling these states] When non-ASCII characters are excluded, additional states must be added: one for each unique prefix of the excluded multibyte characters. It's quite straightforward. I don't understand what "excluded" means here. Is this a property of the string, the regexp, the (dynamic, environmental) operation, or ...?