From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Alan Mackenzie <acm@muc.de>
Newsgroups: gmane.emacs.devel
Subject: Re: Beginingless paragraphs
Date: Sat, 3 Sep 2005 12:26:23 +0000 (GMT)
Message-ID: <Pine.LNX.3.96.1050903121625.302A-100000@acm.acm>
References: <E1EBN2I-0007I0-Gb@fencepost.gnu.org>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Trace: sea.gmane.org 1125751314 22190 80.91.229.2 (3 Sep 2005 12:41:54 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Sat, 3 Sep 2005 12:41:54 +0000 (UTC)
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Sep 03 14:41:46 2005
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Original-Received: from lists.gnu.org ([199.232.76.165])
	by ciao.gmane.org with esmtp (Exim 4.43)
	id 1EBXJf-0007P6-Jp
	for ged-emacs-devel@m.gmane.org; Sat, 03 Sep 2005 14:40:16 +0200
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1EBXO0-0004jw-3K
	for ged-emacs-devel@m.gmane.org; Sat, 03 Sep 2005 08:44:44 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1EBX93-0006iO-64
	for emacs-devel@gnu.org; Sat, 03 Sep 2005 08:29:19 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1EBX8y-0006gI-0y
	for emacs-devel@gnu.org; Sat, 03 Sep 2005 08:29:12 -0400
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1EBX8t-0006ep-7V
	for emacs-devel@gnu.org; Sat, 03 Sep 2005 08:29:10 -0400
Original-Received: from [193.149.49.134] (helo=acm.acm)
	by monty-python.gnu.org with esmtp (Exim 4.34) id 1EBX5L-0003vp-7C
	for emacs-devel@gnu.org; Sat, 03 Sep 2005 08:25:29 -0400
Original-Received: from localhost (root@localhost)
	by acm.acm (8.8.8/8.8.8) with SMTP id MAA00337
	for <emacs-devel@gnu.org>; Sat, 3 Sep 2005 12:26:25 GMT
X-Sender: root@acm.acm
Original-To: emacs-devel@gnu.org
In-Reply-To: <E1EBN2I-0007I0-Gb@fencepost.gnu.org>
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:42606
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/42606>

Hi, Emacs!

On Fri, 2 Sep 2005, Richard M. Stallman wrote:

>    For example, I was tearing my hair out in frustration a couple of years
>    back, trying to get the sentence/paragraph movement and filling stuff to
>    work properly in CC Mode.

>If the documentation of paragraph-start and paragraph-separate is not
>clear enough, we can clarify it.  I doubt that this would take the form
>of a "definition of paragraphs", though.  The reason is that there is no
>simple "definition of paragraphs" at the base of the current code or
>these two variables.

Read my patch and reconsider!

>The concepts that the design is based on are the concepts that you see
>in the manual.

I've worked out just what's been bugging me, and that's the definition of
`paragraph-start':  It suggests (though it doesn't quite explicitly say)
that paragraph-start matches the start of _every_ paragraph.  This isn't
true - any line following a separator line is the start of a paragraph.

>    The four regexps documented on this page all define chunks of
>    natural-language text: paragraphs, pages and sentences.  So how
>    about renaming this @section something like "Sentences, Paragraphs
>    and Pages", and making the focus of the @node the definition of
>    these things in terms of the regexps, rather than the regexps
>    themselves?

>I would be glad to consider a change of this sort.

OK, here's my first shot at a patch:  As a matter of interest, what's
this node doing in "Searching and Matching"?  Would it not be more at
home under "Text"?


2005-09-03  Alan Mackenzie  <acm@muc.de>

	* searching.texi (Standard Regexps): Rename the @section "Regular
	Expressions for Pages, Paragraphs, and Sentences".  Insert a full
	description of paragraphs.


*** searching.texi	Tue Aug 30 09:15:42 2005
--- searching-1.67.acm.texi	Sat Sep  3 12:01:10 2005
***************
*** 1643,1686 ****
  @end table
  
  @node Standard Regexps
! @section Standard Regular Expressions Used in Editing
  @cindex regexps used standardly in editing
  @cindex standard regexps used in editing
  
!   This section describes some variables that hold regular expressions
! used for certain purposes in editing:
! 
  @defvar page-delimiter
! This is the regular expression describing line-beginnings that separate
! pages.  The default value is @code{"^\014"} (i.e., @code{"^^L"} or
! @code{"^\C-l"}); this matches a line that starts with a formfeed
! character.
  @end defvar
  
!   The following two regular expressions should @emph{not} assume the
! match always starts at the beginning of a line; they should not use
! @samp{^} to anchor the match.  Most often, the paragraph commands do
! check for a match only at the beginning of a line, which means that
! @samp{^} would be superfluous.  When there is a nonzero left margin,
! they accept matches that start after the left margin.  In that case, a
! @samp{^} would be incorrect.  However, a @samp{^} is harmless in modes
! where a left margin is never used.
  
  @defvar paragraph-separate
! This is the regular expression for recognizing the beginning of a line
! that separates paragraphs.  (If you change this, you may have to
! change @code{paragraph-start} also.)  The default value is
! @w{@code{"[@ \t\f]*$"}}, which matches a line that consists entirely of
! spaces, tabs, and form feeds (after its left margin).
  @end defvar
  
  @defvar paragraph-start
! This is the regular expression for recognizing the beginning of a line
! that starts @emph{or} separates paragraphs.  The default value is
! @w{@code{"\f\\|[ \t]*$"}}, which matches a line containing only
! whitespace or starting with a form feed (after its left margin).
  @end defvar
  
  @defvar sentence-end
  If non-@code{nil}, the value should be a regular expression describing
  the end of a sentence, including the whitespace following the
--- 1643,1729 ----
  @end table
  
  @node Standard Regexps
! @section Regular Expressions for Pages, Paragraphs, and Sentences
  @cindex regexps used standardly in editing
  @cindex standard regexps used in editing
  
!   This section describes the regular expressions Emacs uses to
! recognize pages, paragraphs, and sentences.  By setting these
! variables appropriately, the Elisp programmer can control the precise
! effect of the standard commands that move over, kill, fill, mark,
! narrow to, and otherwise operate on these pieces of text.  Note that
! these variables are @emph{not} buffer local by default.
! 
! @table @asis
! @cindex page
! @item Pages
  @defvar page-delimiter
! This is the regular expression describing line-beginnings that
! separate pages.  The default value is @code{"^\014"} (i.e.,
! @code{"^^L"} or @code{"^\C-l"}); this matches a line that starts with
! a formfeed character.
  @end defvar
  
! @cindex paragraph
! @item Paragraphs
!   Buffers divide into @dfn{paragraphs}, sequences of whole lines which
! normally don't overlap@footnote{It is possible for a blank line to be
! both the last line of one paragraph and the first line of the next.}.
! Between two paragraphs there may optionally be one or more
! @dfn{separator lines}, which aren't part of any paragraph.  The two
! regular expressions @code{paragraph-separate} and
! @code{paragraph-start} fully determine where paragraphs start and end.
! The beginning and end of the buffer always count as paragraph
! boundaries.
  
  @defvar paragraph-separate
! This regular expression recognizes a separator line by matching any
! portion of it which begins at its left margin (@pxref{Margins for
! Filling}).  (If you change this, you may have to change
! @code{paragraph-start} also.)  The default value is @w{@code{"[@
! \t\f]*$"}}, which matches a line that consists entirely of spaces,
! tabs, and form feeds (after its left margin).
  @end defvar
  
  @defvar paragraph-start
! This regular expression recognizes a line which starts a paragraph
! when the previous line is not a separator.  It need only match some
! portion beginning at the line's left margin (@pxref{Margins for
! Filling}), not the whole line.  It must also be set up to recognize a
! separator line.  The default value is @w{@code{"\f\\|[ \t]*$"}}, which
! matches a line containing only whitespace or starting with a form feed
! (after its left margin).
  @end defvar
  
+ The two variant forms of paragraph breaks are:
+ 
+ @table @asis
+ @item Paragraph break without separator lines
+ Any line, apart from a separator line, which @code{paragraph-start}
+ recognizes starts a new paragraph.
+ 
+ @item Paragraph break with separator lines
+ One or more separator lines split the old paragraph from the new one.
+ Whether @code{paragraph-start} would also recognize the first line of
+ the new paragraph is irrelevant.
+ @end table
+ 
+   As a heuristic feature, if a line tentatively recognized as the
+ start of a paragraph follows a whitespace line, the whitespace line
+ becomes the start of the paragraph instead.
+ 
+   Since the above two regular expressions, @code{paragraph-start} and
+ @code{paragraph-separate}, are matched against text at the left
+ margin, they should @emph{not} use @samp{^} to anchor the match to the
+ beginning of the line.  Most often, the paragraph commands do check
+ for a match only at the beginning of a line, which means that @samp{^}
+ would be superfluous.  When there is a nonzero left margin, they
+ accept matches that start after the left margin.  In that case, a
+ @samp{^} would be incorrect.  However, a @samp{^} is harmless in modes
+ where a left margin is never used.
+ 
+ @cindex sentence
+ @item Sentences
  @defvar sentence-end
  If non-@code{nil}, the value should be a regular expression describing
  the end of a sentence, including the whitespace following the
***************
*** 1700,1705 ****
--- 1743,1749 ----
  @code{sentence-end-without-period} and
  @code{sentence-end-without-space}.
  @end defun
+ @end table
  
  @ignore
     arch-tag: c2573ca2-18aa-4839-93b8-924043ef831f


-- 
Alan Mackenzie (Munich, Germany)