From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Drew Adams Newsgroups: gmane.emacs.devel Subject: RE: Text property searching Date: Mon, 16 Apr 2018 13:05:28 -0700 (PDT) Message-ID: <78f73e87-367d-4ab9-abfe-a1a60cbc44eb@default> References: <87lgdo5bb3.fsf@mouse.gnus.org> <87in8r16b0.fsf@mouse.gnus.org> <87d0yz15a3.fsf@mouse.gnus.org> <87604r143y.fsf@mouse.gnus.org> <87wox7yrz8.fsf@mouse.gnus.org> <87muy3ypl8.fsf@mouse.gnus.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1523909060 24147 195.159.176.226 (16 Apr 2018 20:04:20 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 16 Apr 2018 20:04:20 +0000 (UTC) To: Lars Ingebrigtsen , emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Apr 16 22:04:16 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f8AMJ-00068m-NY for ged-emacs-devel@m.gmane.org; Mon, 16 Apr 2018 22:04:16 +0200 Original-Received: from localhost ([::1]:40504 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f8AOO-0001CF-I1 for ged-emacs-devel@m.gmane.org; Mon, 16 Apr 2018 16:06:24 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:35042) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f8ANf-0001B0-F8 for emacs-devel@gnu.org; Mon, 16 Apr 2018 16:05:41 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f8ANb-0008Ts-FP for emacs-devel@gnu.org; Mon, 16 Apr 2018 16:05:39 -0400 Original-Received: from userp2130.oracle.com ([156.151.31.86]:54270) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1f8ANb-0008Qu-5G for emacs-devel@gnu.org; Mon, 16 Apr 2018 16:05:35 -0400 Original-Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w3GJumwl110283; Mon, 16 Apr 2018 20:05:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=mime-version : message-id : date : from : sender : to : subject : references : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=Y/UyQMx0rT442spYOS0Bf9LSNrLhJ0jDZqQ5WZclz4U=; b=D3Z8J9JIDIzA0+379LlEPeSilBDtxR8H4oP6gHjaGKDKMdQhuQfK1EwtlGHq+6BmQkFy MBfQ3k20U+sZKgPFPU4OITg4gLO/CMe88INdZLijc/kUgzj99m13R+9d0cTXyiELIM3T 72sj9RN4hD0IT+8z4/Bb4rSmZjF61lpWW7s89EjeMXwQObWO1Rm/3a2FbqPpguzLgrAL oxWrPtMIQEgNxZFJ21BmQJt45V5WTRqospHSeVp1ukz0LNqV3oUD826iniezVH5GG6BN 30RmOLXXINUM2WuT2KpTz0Hc8K2ES6epFDbCo3iULr+1WtjAq2dMbGylNByeCCnPhD0o hQ== Original-Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2130.oracle.com with ESMTP id 2hbam5xtfx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 16 Apr 2018 20:05:32 +0000 Original-Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w3GK5VMN024669 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 16 Apr 2018 20:05:31 GMT Original-Received: from abhmp0005.oracle.com (abhmp0005.oracle.com [141.146.116.11]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w3GK5UXj010922; Mon, 16 Apr 2018 20:05:30 GMT In-Reply-To: <87muy3ypl8.fsf@mouse.gnus.org> X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.9.1 (1003210) [OL 16.0.4678.0 (x86)] X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8865 signatures=668698 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1804160170 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] [fuzzy] X-Received-From: 156.151.31.86 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:224678 Archived-At: > Below is a draft of the documentation of this function. Does it all > make sense? :-) (I'm going only by your doc/description, not the code, which I don't have and won't bother to try to access.) What if someone doesn't want to gather strings but instead wants the match-zone limits? E.g., instead of returning buffer substrings for the matches, return conses (beg . end). This is (should be) mainly about searching the _buffer_. It is not (should not be) mainly about gathering a list of matching strings (or a defstruct holding such a list). IOW, this sounds wrong, to me: This function is modelled after =E2=80=98search-forward=E2=80=99 and frie= nds in that it moves point, but it returns a structure that describes the match instead of returning it in =E2=80=98match-beginning=E2=80=99 and fr= iends. And better than it returning (beg . end) conses is for it to just provide access, on demand, to the matched text and its positions using `match-data' - the usual Emacs approach. IOW, better for it to _really_ be "modeled after `search-forward'" - to find and return a buffer position. (`search-forward' does not just "move point" - it returns it.) With `search-forward' the side effect of matching lets you easily do various things with the `match-data' (always only on demand). Why return a structure here? Why even build a structure and put the relevant info into it? Why not let the usual kind of `search-forward'-using code work just as well with your minor variant: get whatever info you want, on demand, from the `match-data'? The current design sounds a bit analogous to tossing out `match-data' in favor of just `match-string'. Except that you even _return_ the strings, in a defstruct no less. That might seem to be convenient for someone who always wants the strings, but it sounds less useful generally. Similarly, I'd think we would want all of the same optional args and behavior as are provided by `search-forward': limiting the search scope, raising or suppressing an error, and repeating for a given count. That's a proven and widely used Emacs interface. In sum, why isn't `search-forward' a proper model in all respects? > -- Function: text-property-search-forward prop value predicate > Search for the next region that has text property PROP set to VALUE > according to PREDICATE. >=20 > This function is modelled after =E2=80=98search-forward=E2=80=99 and= friends in > that it moves point, but it returns a structure that describes the > match instead of returning it in =E2=80=98match-beginning=E2=80=99 a= nd friends. >=20 > If the text property can=E2=80=99t be found, the function returns = =E2=80=98nil=E2=80=99. > If it=E2=80=99s found, point is placed at the end of the region that= has > this text property match, and a =E2=80=98prop-match=E2=80=99 structu= re is returned. >=20 > PREDICATE can either be =E2=80=98t=E2=80=99 (which is a synonym for = =E2=80=98equal=E2=80=99), =E2=80=98nil=E2=80=99 > (which means =E2=80=9Cnot equal=E2=80=9D), or a predicate that will = be called with > two parameters: The first is VALUE, and the second is the value of > the text property we=E2=80=99re inspecting. >=20 > In the examples below, imagine that you=E2=80=99re in a buffer that = looks > like this: >=20 > This is a bold and here's bolditalic and this is the end. >=20 > That is, the =E2=80=9Cbold=E2=80=9D words are the =E2=80=98bold=E2= =80=99 face, and the =E2=80=9Citalic=E2=80=9D > word is in the =E2=80=98italic=E2=80=99 face. >=20 > With point at the start: >=20 > (while (setq match (text-property-search-forward 'face 'bold > t)) > (push (buffer-substring (prop-match-beginning match) (prop- > match-end match)) > words)) >=20 > This will pick out all the words that use the =E2=80=98bold=E2=80=99= face. >=20 > (while (setq match (text-property-search-forward 'face nil t)) > (push (buffer-substring (prop-match-beginning match) (prop- > match-end match)) > words)) >=20 > This will pick out all the bits that have no face properties, which > will result in the list =E2=80=98("This is a " "and here's " "and th= is is > the end")=E2=80=99 (only reversed, since we used =E2=80=98push=E2=80= =99). >=20 > (while (setq match (text-property-search-forward 'face nil > nil)) > (push (buffer-substring (prop-match-beginning match) (prop- > match-end match)) > words)) >=20 > This will pick out all the regions where =E2=80=98face=E2=80=99 is s= et to > something, but this is split up into where the properties change, > so the result here will be =E2=80=98"bold" "bold" "italic"=E2=80=99. >=20 > For a more realistic example where you might use this, consider > that you have a buffer where certain sections represent URLs, and > these are tagged with =E2=80=98shr-url=E2=80=99. >=20 > (while (setq match (text-property-search-forward 'shr-url nil > nil)) > (push (prop-match-value match) urls)) >=20 > This will give you a list of all those URLs. >=20 > --- >=20 > Hm... it strikes me now that the two last parameters should be > optional, since (text-property-search-forward 'shr-url) would then be > even more obvious in its meaning. >=20 > -- > (domestic pets only, the antidote for overdose, milk.) > bloggy blog: https://urldefense.proofpoint.com/v2/url?u=3Dhttp- > 3A__lars.ingebrigtsen.no&d=3DDwIFaQ&c=3DRoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB= 65ea > pI_JnE&r=3DkI3P6ljGv6CTHIKju0jqInF6AOwMCYRDQUmqX22rJ98&m=3DYw3C0DwmaGuclC= aCVP > qf0h4uc8nQ0WGIsKOuB6erSDk&s=3DAD99bU7m0KQGk9biPMMiyY0fEF5YLeA2s_8c- > nbYakQ&e=3D >=20 >=20