From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuri Khan Newsgroups: gmane.emacs.tangents Subject: Negating a regexp Date: Thu, 20 May 2021 17:29:50 +0700 Message-ID: References: <20210519213207.GD4855@tuxteam.de> <20210520082613.GC1127@tuxteam.de> <20210520095603.GD1127@tuxteam.de> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="16317"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-tangents@gnu.org, steve-humphreys@gmx.com To: tomas@tuxteam.de Original-X-From: emacs-tangents-bounces+get-emacs-tangents=m.gmane-mx.org@gnu.org Thu May 20 12:32:32 2021 Return-path: Envelope-to: get-emacs-tangents@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ljfyh-0003wX-VI for get-emacs-tangents@m.gmane-mx.org; Thu, 20 May 2021 12:32:31 +0200 Original-Received: from localhost ([::1]:55374 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ljfyg-00070t-Ui for get-emacs-tangents@m.gmane-mx.org; Thu, 20 May 2021 06:32:30 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:34926) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ljfwN-0004zT-TQ for emacs-tangents@gnu.org; Thu, 20 May 2021 06:30:08 -0400 Original-Received: from mail-vs1-xe36.google.com ([2607:f8b0:4864:20::e36]:38474) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ljfwL-0005iK-Bc for emacs-tangents@gnu.org; Thu, 20 May 2021 06:30:07 -0400 Original-Received: by mail-vs1-xe36.google.com with SMTP id e18so6092106vsk.5 for ; Thu, 20 May 2021 03:30:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=7UoMIlbziC98u6Vgx0uo7G5Dtfpt8dN7hbInZ7SWHJ4=; b=V+KO1CYPBz6OV41IUVRyIXYKTBJ1mKlAMv1PS9r5OaDfnXkePqKzKxU7B4LX5XE4hG n0Ef54u747sq78rHN1b3SNxG1yx9uvP8R1IXIihzyqphlRjlpWKOzWDlNxXcWlR4tZ5b BjIvfrmtWHyy+kGfoSDLJPPfC9lmj/4t/6BKz8yKq5hlhcyTuZ0g+AjifUqP7K7VDzZC kweBM846oJMcx+UUxZceO58Vo4ED9UbQmFnBdHv0MZGWKHzhPWpdbSW+DRJp31K6h3Ro FSEwvOgY6+liKRnmliaGYa3hzjo2Wm1s8RBx+a6COUTYKbesXP1rhyX1LVNr0pDZrNKM 9p+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=7UoMIlbziC98u6Vgx0uo7G5Dtfpt8dN7hbInZ7SWHJ4=; b=cchZVrVLkxNX+Isu+lFeEhrgWUg2741cYqJEkUTqT4fa5aFHDjTtlEXYgIERO3HxKT cPdOpO+f6HRyX+YsvwU2sQ3GRZTIuHAK8CgSxSCLCYGVDdGS/2w6tf1e1VDLJ11RWf62 YrsXUiohWmyTLWsVPY5ypbZ6uVqcwilBNy8Kle+0stS7ZMgO2Xgh1DlWEt2wLxlkW2og ftvUB5JFGlf2JszID+L83knbAuLoR2FtYcLJOKvNBOaChxZdbFgXFScsaLk1futZe89v nKWSWEsE0nTWv4gycLMtG0X1gBh+mvs4AvHZyAgrO5X94U01aGu0ZPsGY4OqfIsJXAzw cUmQ== X-Gm-Message-State: AOAM530WiaxyLjVHW3/Z3Ej+NRKmI+dD2S3sqZeNihT1f+G3zuivK76b j1TlT+aaH5ZBi4q6AwqYI3pvP7X9RB/rNxLudfQ= X-Google-Smtp-Source: ABdhPJyvpPlV7myKaTa32X1HOtZjtP3OCPke0y8fDUPmgbr5nahAEEXRgXWFuGFC8uBu6h/3qZrDf+YBjoopr5p61FA= X-Received: by 2002:a67:14c1:: with SMTP id 184mr2970170vsu.38.1621506601727; Thu, 20 May 2021 03:30:01 -0700 (PDT) In-Reply-To: <20210520095603.GD1127@tuxteam.de> Received-SPF: pass client-ip=2607:f8b0:4864:20::e36; envelope-from=yurivkhan@gmail.com; helo=mail-vs1-xe36.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-tangents@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Emacs news and miscellaneous discussions outside the scope of other Emacs mailing lists List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-tangents-bounces+get-emacs-tangents=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-tangents" Xref: news.gmane.io gmane.emacs.tangents:639 Archived-At: > > How could I negate the regexp that I have defined? > > I don't even know what you mean by "negating a regexp". The automata theory, where the notion of a regexp comes from, defines a =E2=80=9Cregular set=E2=80=9D as one that can be described using a regexp= (where regexps are defined to be implicitly anchored to beginning and end of string, and support concatenation, alternative, and iteration, but not backreferences). It then proves that for each regexp there exists an equivalent nondeterministic finite automaton, and for every NDFA there exists an equivalent DFA, and for every DFA there exists an equivalent regexp. It also proves that the class of regular sets is closed under set theory operations =E2=80=94 union, intersection, and complement. It follows that, theoretically, a regexp =E2=80=98R=E2=80=99 can be negated= =E2=80=94 one can construct a regexp =E2=80=98(not R)=E2=80=99 that matches exactly all strin= gs that R does not match. However, the proofs and constructions are complex enough that in practice such a regexp would be unreadable. As an example, consider the regexp =E2=80=98a=E2=80=99 which matches only a= single one-character string. Its complement would be a regexp that matches the empty string, all one-character strings whose character is not =E2=80=98a=E2=80=99, and all strings longer than one character. The shortes= t way to express that idea that I can think of is =E2=80=98|[^a]|.{2,}=E2=80=99 and = that=E2=80=99s ignoring the issue of newlines. So, to steve-humphreys: There is no practical general way to negate a regexp. You need to either negate the result of attempting to match, or to think hard and write a new regexp that matches what you want.