From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Regular expression libraries Date: Thu, 15 Dec 2016 22:10:00 +0200 Message-ID: <83wpf17y3r.fsf@gnu.org> References: <01d7e608-04d2-84a4-6143-e954bc9d569f@mit.edu> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1481832695 22076 195.159.176.226 (15 Dec 2016 20:11:35 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 15 Dec 2016 20:11:35 +0000 (UTC) Cc: emacs-devel@gnu.org To: =?utf-8?Q?Cl=C3=A9ment?= Pit--Claudel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Dec 15 21:11:32 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cHcNG-0004Vr-9N for ged-emacs-devel@m.gmane.org; Thu, 15 Dec 2016 21:11:30 +0100 Original-Received: from localhost ([::1]:56836 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cHcNK-0005Es-Jk for ged-emacs-devel@m.gmane.org; Thu, 15 Dec 2016 15:11:34 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:57525) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cHcMo-00059X-8k for emacs-devel@gnu.org; Thu, 15 Dec 2016 15:11:03 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cHcMl-000863-Kb for emacs-devel@gnu.org; Thu, 15 Dec 2016 15:11:02 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:57517) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cHcMl-00085u-H6; Thu, 15 Dec 2016 15:10:59 -0500 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1738 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1cHcMi-0000s4-Qx; Thu, 15 Dec 2016 15:10:59 -0500 In-reply-to: <01d7e608-04d2-84a4-6143-e954bc9d569f@mit.edu> (message from =?utf-8?Q?Cl=C3=A9ment?= Pit--Claudel on Thu, 15 Dec 2016 14:00:25 -0500) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:210486 Archived-At: > From: Clément Pit--Claudel > Date: Thu, 15 Dec 2016 14:00:25 -0500 > > * Emacs has special regexp features based on syntax classes, or the position of the point. These features don't seem to be supported in gnulib nor glibc > * Emacs uses a gap buffer, whereas gnulib and glibc expects a char* for the subject string There's one other important aspect: Emacs doesn't use the locale to implement the likes of [:alnum:], [:print:], etc. We use our own functions that access Unicode data stored in specialized char-tables, see the uni-*.el files. We also use our own macros to fetch multibyte characters from buffers and strings, instead of functions from the standard C library. > If I understand our current implementation, our regexp.c functions take the moral equivalent of two strings, and match over that, pretending that it's just one large string. In practice, these two strings are the two halves of the gap buffer. Correct? Yes.