From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: Beginingless paragraphs Date: Sat, 3 Sep 2005 12:26:23 +0000 (GMT) Message-ID: References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Trace: sea.gmane.org 1125751314 22190 80.91.229.2 (3 Sep 2005 12:41:54 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 3 Sep 2005 12:41:54 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Sep 03 14:41:46 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1EBXJf-0007P6-Jp for ged-emacs-devel@m.gmane.org; Sat, 03 Sep 2005 14:40:16 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1EBXO0-0004jw-3K for ged-emacs-devel@m.gmane.org; Sat, 03 Sep 2005 08:44:44 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1EBX93-0006iO-64 for emacs-devel@gnu.org; Sat, 03 Sep 2005 08:29:19 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1EBX8y-0006gI-0y for emacs-devel@gnu.org; Sat, 03 Sep 2005 08:29:12 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1EBX8t-0006ep-7V for emacs-devel@gnu.org; Sat, 03 Sep 2005 08:29:10 -0400 Original-Received: from [193.149.49.134] (helo=acm.acm) by monty-python.gnu.org with esmtp (Exim 4.34) id 1EBX5L-0003vp-7C for emacs-devel@gnu.org; Sat, 03 Sep 2005 08:25:29 -0400 Original-Received: from localhost (root@localhost) by acm.acm (8.8.8/8.8.8) with SMTP id MAA00337 for ; Sat, 3 Sep 2005 12:26:25 GMT X-Sender: root@acm.acm Original-To: emacs-devel@gnu.org In-Reply-To: X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:42606 Archived-At: Hi, Emacs! On Fri, 2 Sep 2005, Richard M. Stallman wrote: > For example, I was tearing my hair out in frustration a couple of years > back, trying to get the sentence/paragraph movement and filling stuff to > work properly in CC Mode. >If the documentation of paragraph-start and paragraph-separate is not >clear enough, we can clarify it. I doubt that this would take the form >of a "definition of paragraphs", though. The reason is that there is no >simple "definition of paragraphs" at the base of the current code or >these two variables. Read my patch and reconsider! >The concepts that the design is based on are the concepts that you see >in the manual. I've worked out just what's been bugging me, and that's the definition of `paragraph-start': It suggests (though it doesn't quite explicitly say) that paragraph-start matches the start of _every_ paragraph. This isn't true - any line following a separator line is the start of a paragraph. > The four regexps documented on this page all define chunks of > natural-language text: paragraphs, pages and sentences. So how > about renaming this @section something like "Sentences, Paragraphs > and Pages", and making the focus of the @node the definition of > these things in terms of the regexps, rather than the regexps > themselves? >I would be glad to consider a change of this sort. OK, here's my first shot at a patch: As a matter of interest, what's this node doing in "Searching and Matching"? Would it not be more at home under "Text"? 2005-09-03 Alan Mackenzie * searching.texi (Standard Regexps): Rename the @section "Regular Expressions for Pages, Paragraphs, and Sentences". Insert a full description of paragraphs. *** searching.texi Tue Aug 30 09:15:42 2005 --- searching-1.67.acm.texi Sat Sep 3 12:01:10 2005 *************** *** 1643,1686 **** @end table @node Standard Regexps ! @section Standard Regular Expressions Used in Editing @cindex regexps used standardly in editing @cindex standard regexps used in editing ! This section describes some variables that hold regular expressions ! used for certain purposes in editing: ! @defvar page-delimiter ! This is the regular expression describing line-beginnings that separate ! pages. The default value is @code{"^\014"} (i.e., @code{"^^L"} or ! @code{"^\C-l"}); this matches a line that starts with a formfeed ! character. @end defvar ! The following two regular expressions should @emph{not} assume the ! match always starts at the beginning of a line; they should not use ! @samp{^} to anchor the match. Most often, the paragraph commands do ! check for a match only at the beginning of a line, which means that ! @samp{^} would be superfluous. When there is a nonzero left margin, ! they accept matches that start after the left margin. In that case, a ! @samp{^} would be incorrect. However, a @samp{^} is harmless in modes ! where a left margin is never used. @defvar paragraph-separate ! This is the regular expression for recognizing the beginning of a line ! that separates paragraphs. (If you change this, you may have to ! change @code{paragraph-start} also.) The default value is ! @w{@code{"[@ \t\f]*$"}}, which matches a line that consists entirely of ! spaces, tabs, and form feeds (after its left margin). @end defvar @defvar paragraph-start ! This is the regular expression for recognizing the beginning of a line ! that starts @emph{or} separates paragraphs. The default value is ! @w{@code{"\f\\|[ \t]*$"}}, which matches a line containing only ! whitespace or starting with a form feed (after its left margin). @end defvar @defvar sentence-end If non-@code{nil}, the value should be a regular expression describing the end of a sentence, including the whitespace following the --- 1643,1729 ---- @end table @node Standard Regexps ! @section Regular Expressions for Pages, Paragraphs, and Sentences @cindex regexps used standardly in editing @cindex standard regexps used in editing ! This section describes the regular expressions Emacs uses to ! recognize pages, paragraphs, and sentences. By setting these ! variables appropriately, the Elisp programmer can control the precise ! effect of the standard commands that move over, kill, fill, mark, ! narrow to, and otherwise operate on these pieces of text. Note that ! these variables are @emph{not} buffer local by default. ! ! @table @asis ! @cindex page ! @item Pages @defvar page-delimiter ! This is the regular expression describing line-beginnings that ! separate pages. The default value is @code{"^\014"} (i.e., ! @code{"^^L"} or @code{"^\C-l"}); this matches a line that starts with ! a formfeed character. @end defvar ! @cindex paragraph ! @item Paragraphs ! Buffers divide into @dfn{paragraphs}, sequences of whole lines which ! normally don't overlap@footnote{It is possible for a blank line to be ! both the last line of one paragraph and the first line of the next.}. ! Between two paragraphs there may optionally be one or more ! @dfn{separator lines}, which aren't part of any paragraph. The two ! regular expressions @code{paragraph-separate} and ! @code{paragraph-start} fully determine where paragraphs start and end. ! The beginning and end of the buffer always count as paragraph ! boundaries. @defvar paragraph-separate ! This regular expression recognizes a separator line by matching any ! portion of it which begins at its left margin (@pxref{Margins for ! Filling}). (If you change this, you may have to change ! @code{paragraph-start} also.) The default value is @w{@code{"[@ ! \t\f]*$"}}, which matches a line that consists entirely of spaces, ! tabs, and form feeds (after its left margin). @end defvar @defvar paragraph-start ! This regular expression recognizes a line which starts a paragraph ! when the previous line is not a separator. It need only match some ! portion beginning at the line's left margin (@pxref{Margins for ! Filling}), not the whole line. It must also be set up to recognize a ! separator line. The default value is @w{@code{"\f\\|[ \t]*$"}}, which ! matches a line containing only whitespace or starting with a form feed ! (after its left margin). @end defvar + The two variant forms of paragraph breaks are: + + @table @asis + @item Paragraph break without separator lines + Any line, apart from a separator line, which @code{paragraph-start} + recognizes starts a new paragraph. + + @item Paragraph break with separator lines + One or more separator lines split the old paragraph from the new one. + Whether @code{paragraph-start} would also recognize the first line of + the new paragraph is irrelevant. + @end table + + As a heuristic feature, if a line tentatively recognized as the + start of a paragraph follows a whitespace line, the whitespace line + becomes the start of the paragraph instead. + + Since the above two regular expressions, @code{paragraph-start} and + @code{paragraph-separate}, are matched against text at the left + margin, they should @emph{not} use @samp{^} to anchor the match to the + beginning of the line. Most often, the paragraph commands do check + for a match only at the beginning of a line, which means that @samp{^} + would be superfluous. When there is a nonzero left margin, they + accept matches that start after the left margin. In that case, a + @samp{^} would be incorrect. However, a @samp{^} is harmless in modes + where a left margin is never used. + + @cindex sentence + @item Sentences @defvar sentence-end If non-@code{nil}, the value should be a regular expression describing the end of a sentence, including the whitespace following the *************** *** 1700,1705 **** --- 1743,1749 ---- @code{sentence-end-without-period} and @code{sentence-end-without-space}. @end defun + @end table @ignore arch-tag: c2573ca2-18aa-4839-93b8-924043ef831f -- Alan Mackenzie (Munich, Germany)