From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Dmitry Antipov Newsgroups: gmane.emacs.devel Subject: Too permissive compare-strings? Date: Tue, 24 Jun 2014 18:03:16 +0400 Message-ID: <53A98524.4@yandex.ru> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------090700010607000907060706" X-Trace: ger.gmane.org 1403618641 27875 80.91.229.3 (24 Jun 2014 14:04:01 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 24 Jun 2014 14:04:01 +0000 (UTC) To: Emacs development discussions Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jun 24 16:03:54 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WzRKD-0002j8-DV for ged-emacs-devel@m.gmane.org; Tue, 24 Jun 2014 16:03:53 +0200 Original-Received: from localhost ([::1]:59933 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WzRKD-0007uf-04 for ged-emacs-devel@m.gmane.org; Tue, 24 Jun 2014 10:03:53 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46794) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WzRK1-0007k3-Pi for emacs-devel@gnu.org; Tue, 24 Jun 2014 10:03:48 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WzRJu-0008H2-Oh for emacs-devel@gnu.org; Tue, 24 Jun 2014 10:03:41 -0400 Original-Received: from forward4o.mail.yandex.net ([37.140.190.33]:37273) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WzRJu-0008GQ-B3 for emacs-devel@gnu.org; Tue, 24 Jun 2014 10:03:34 -0400 Original-Received: from smtp4o.mail.yandex.net (smtp4o.mail.yandex.net [37.140.190.29]) by forward4o.mail.yandex.net (Yandex) with ESMTP id 9C5C156404AB for ; Tue, 24 Jun 2014 18:03:17 +0400 (MSK) Original-Received: from smtp4o.mail.yandex.net (localhost [127.0.0.1]) by smtp4o.mail.yandex.net (Yandex) with ESMTP id 747BD23223D3 for ; Tue, 24 Jun 2014 18:03:17 +0400 (MSK) Original-Received: from unknown (unknown [37.139.80.10]) by smtp4o.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id SQybKEjyBY-3Hric7ht; Tue, 24 Jun 2014 18:03:17 +0400 (using TLSv1 with cipher AES128-SHA (128/128 bits)) (Client certificate not present) X-Yandex-Uniq: b69ecdaa-926f-4843-8ace-133d405a31a2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1403618597; bh=YJB0RWVUNcZkUa3qwN7aBYmve1GeiHTe+6NK2/oakYA=; h=Message-ID:Date:From:User-Agent:MIME-Version:To:Subject: Content-Type; b=ervF5TMpBd1mrasWeUFrOSgYKcGnLptlU4mA5MjNT4YJGo+BbHQNbheJ/gYnezBzV EqeWVLUrom57BKpSqNqKAQKIGK3N3uFQ/V6gWUxNnBPG5E/Cq5HnYkXrxODjskfnZM FhwrR6Kpxbl2jaZnTMOGSFc43swY+klsacTbBCnk= Authentication-Results: smtp4o.mail.yandex.net; dkim=pass header.i=@yandex.ru User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] [fuzzy] X-Received-From: 37.140.190.33 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:172680 Archived-At: This is a multi-part message in MIME format. --------------090700010607000907060706 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Recently I tried to utilize the common substring range validation function validate_subarray in Fcompare_strings (patch is attached). Surprisingly, I also found that current implementation of this function silently allows invalid ranges. E.g. this is not an error: (compare-strings "test" 0 100 "testbed" 0 4) Although 100 is invalid, this is 't'. Compare it with substring: (substring "test" 0 100) ==> not "test", but Args out of range: "test", 0, 100 Not too serious problem? Wrong. A lot of existing Lisp code assumes current semantic of compare-strings and tends to specify range end which is beyond the end of string. After applying my patch, I was unable to bootstrap, with a lot of errors from lisp/subr.el and lisp/files.el. Thoughts? Dmitry --------------090700010607000907060706 Content-Type: text/x-patch; name="compare_strings.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="compare_strings.patch" =3D=3D=3D modified file 'src/fns.c' --- src/fns.c 2014-05-21 03:49:58 +0000 +++ src/fns.c 2014-06-24 10:04:45 +0000 @@ -50,7 +50,9 @@ static Lisp_Object Qmd5, Qsha1, Qsha224, Qsha256, Qsha384, Qsha512; =20 static bool internal_equal (Lisp_Object, Lisp_Object, int, bool, Lisp_Ob= ject); -=0C +static void validate_subarray (Lisp_Object, Lisp_Object, Lisp_Object, + ptrdiff_t, EMACS_INT *, EMACS_INT *); + DEFUN ("identity", Fidentity, Sidentity, 1, 1, 0, doc: /* Return the argument unchanged. */) (Lisp_Object arg) @@ -232,6 +234,7 @@ \(exclusive). If START1 is nil, it defaults to 0, the beginning of the string; if END1 is nil, it defaults to the length of the string. Likewise, in string STR2, compare the part between START2 and END2. +Like in `substring', negative values are counted from the end. =20 The strings are compared by the numeric values of their characters. For instance, STR1 is "less than" STR2 if its first differing @@ -244,43 +247,25 @@ - 1 - N is the number of characters that match at the beginning. If string STR1 is greater, the value is a positive number N; N - 1 is the number of characters that match at the beginning. */) - (Lisp_Object str1, Lisp_Object start1, Lisp_Object end1, Lisp_Object s= tr2, Lisp_Object start2, Lisp_Object end2, Lisp_Object ignore_case) + (Lisp_Object str1, Lisp_Object start1, Lisp_Object end1, Lisp_Object s= tr2, + Lisp_Object start2, Lisp_Object end2, Lisp_Object ignore_case) { - register ptrdiff_t end1_char, end2_char; - register ptrdiff_t i1, i1_byte, i2, i2_byte; + EMACS_INT from1, to1, from2, to2; + ptrdiff_t i1, i1_byte, i2, i2_byte; =20 CHECK_STRING (str1); CHECK_STRING (str2); - if (NILP (start1)) - start1 =3D make_number (0); - if (NILP (start2)) - start2 =3D make_number (0); - CHECK_NATNUM (start1); - CHECK_NATNUM (start2); - if (! NILP (end1)) - CHECK_NATNUM (end1); - if (! NILP (end2)) - CHECK_NATNUM (end2); - - end1_char =3D SCHARS (str1); - if (! NILP (end1) && end1_char > XINT (end1)) - end1_char =3D XINT (end1); - if (end1_char < XINT (start1)) - args_out_of_range (str1, start1); - - end2_char =3D SCHARS (str2); - if (! NILP (end2) && end2_char > XINT (end2)) - end2_char =3D XINT (end2); - if (end2_char < XINT (start2)) - args_out_of_range (str2, start2); - - i1 =3D XINT (start1); - i2 =3D XINT (start2); + + validate_subarray (str1, start1, end1, SCHARS (str1), &from1, &to1); + validate_subarray (str2, start2, end2, SCHARS (str2), &from2, &to2); + + i1 =3D from1; + i2 =3D from2; =20 i1_byte =3D string_char_to_byte (str1, i1); i2_byte =3D string_char_to_byte (str2, i2); =20 - while (i1 < end1_char && i2 < end2_char) + while (i1 < to1 && i2 < to2) { /* When we find a mismatch, we must compare the characters, not just the bytes. */ @@ -307,12 +292,8 @@ =20 if (! NILP (ignore_case)) { - Lisp_Object tem; - - tem =3D Fupcase (make_number (c1)); - c1 =3D XINT (tem); - tem =3D Fupcase (make_number (c2)); - c2 =3D XINT (tem); + c1 =3D XINT (Fupcase (make_number (c1))); + c2 =3D XINT (Fupcase (make_number (c2))); } =20 if (c1 =3D=3D c2) @@ -322,15 +303,15 @@ past the character that we are comparing; hence we don't add or subtract 1 here. */ if (c1 < c2) - return make_number (- i1 + XINT (start1)); + return make_number (- i1 + from1); else - return make_number (i1 - XINT (start1)); + return make_number (i1 - from1); } =20 - if (i1 < end1_char) - return make_number (i1 - XINT (start1) + 1); - if (i2 < end2_char) - return make_number (- i1 + XINT (start1) - 1); + if (i1 < to1) + return make_number (i1 - from1 + 1); + if (i2 < to2) + return make_number (- i1 + from1 - 1); =20 return Qt; } --------------090700010607000907060706--