all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Alan Mackenzie <acm@muc.de>
Subject: Re: Beginingless paragraphs
Date: Sat, 3 Sep 2005 12:26:23 +0000 (GMT)	[thread overview]
Message-ID: <Pine.LNX.3.96.1050903121625.302A-100000@acm.acm> (raw)
In-Reply-To: <E1EBN2I-0007I0-Gb@fencepost.gnu.org>

Hi, Emacs!

On Fri, 2 Sep 2005, Richard M. Stallman wrote:

>    For example, I was tearing my hair out in frustration a couple of years
>    back, trying to get the sentence/paragraph movement and filling stuff to
>    work properly in CC Mode.

>If the documentation of paragraph-start and paragraph-separate is not
>clear enough, we can clarify it.  I doubt that this would take the form
>of a "definition of paragraphs", though.  The reason is that there is no
>simple "definition of paragraphs" at the base of the current code or
>these two variables.

Read my patch and reconsider!

>The concepts that the design is based on are the concepts that you see
>in the manual.

I've worked out just what's been bugging me, and that's the definition of
`paragraph-start':  It suggests (though it doesn't quite explicitly say)
that paragraph-start matches the start of _every_ paragraph.  This isn't
true - any line following a separator line is the start of a paragraph.

>    The four regexps documented on this page all define chunks of
>    natural-language text: paragraphs, pages and sentences.  So how
>    about renaming this @section something like "Sentences, Paragraphs
>    and Pages", and making the focus of the @node the definition of
>    these things in terms of the regexps, rather than the regexps
>    themselves?

>I would be glad to consider a change of this sort.

OK, here's my first shot at a patch:  As a matter of interest, what's
this node doing in "Searching and Matching"?  Would it not be more at
home under "Text"?



2005-09-03  Alan Mackenzie  <acm@muc.de>

	* searching.texi (Standard Regexps): Rename the @section "Regular
	Expressions for Pages, Paragraphs, and Sentences".  Insert a full
	description of paragraphs.


*** searching.texi	Tue Aug 30 09:15:42 2005
--- searching-1.67.acm.texi	Sat Sep  3 12:01:10 2005
***************
*** 1643,1686 ****
  @end table
  
  @node Standard Regexps
! @section Standard Regular Expressions Used in Editing
  @cindex regexps used standardly in editing
  @cindex standard regexps used in editing
  
!   This section describes some variables that hold regular expressions
! used for certain purposes in editing:
! 
  @defvar page-delimiter
! This is the regular expression describing line-beginnings that separate
! pages.  The default value is @code{"^\014"} (i.e., @code{"^^L"} or
! @code{"^\C-l"}); this matches a line that starts with a formfeed
! character.
  @end defvar
  
!   The following two regular expressions should @emph{not} assume the
! match always starts at the beginning of a line; they should not use
! @samp{^} to anchor the match.  Most often, the paragraph commands do
! check for a match only at the beginning of a line, which means that
! @samp{^} would be superfluous.  When there is a nonzero left margin,
! they accept matches that start after the left margin.  In that case, a
! @samp{^} would be incorrect.  However, a @samp{^} is harmless in modes
! where a left margin is never used.
  
  @defvar paragraph-separate
! This is the regular expression for recognizing the beginning of a line
! that separates paragraphs.  (If you change this, you may have to
! change @code{paragraph-start} also.)  The default value is
! @w{@code{"[@ \t\f]*$"}}, which matches a line that consists entirely of
! spaces, tabs, and form feeds (after its left margin).
  @end defvar
  
  @defvar paragraph-start
! This is the regular expression for recognizing the beginning of a line
! that starts @emph{or} separates paragraphs.  The default value is
! @w{@code{"\f\\|[ \t]*$"}}, which matches a line containing only
! whitespace or starting with a form feed (after its left margin).
  @end defvar
  
  @defvar sentence-end
  If non-@code{nil}, the value should be a regular expression describing
  the end of a sentence, including the whitespace following the
--- 1643,1729 ----
  @end table
  
  @node Standard Regexps
! @section Regular Expressions for Pages, Paragraphs, and Sentences
  @cindex regexps used standardly in editing
  @cindex standard regexps used in editing
  
!   This section describes the regular expressions Emacs uses to
! recognize pages, paragraphs, and sentences.  By setting these
! variables appropriately, the Elisp programmer can control the precise
! effect of the standard commands that move over, kill, fill, mark,
! narrow to, and otherwise operate on these pieces of text.  Note that
! these variables are @emph{not} buffer local by default.
! 
! @table @asis
! @cindex page
! @item Pages
  @defvar page-delimiter
! This is the regular expression describing line-beginnings that
! separate pages.  The default value is @code{"^\014"} (i.e.,
! @code{"^^L"} or @code{"^\C-l"}); this matches a line that starts with
! a formfeed character.
  @end defvar
  
! @cindex paragraph
! @item Paragraphs
!   Buffers divide into @dfn{paragraphs}, sequences of whole lines which
! normally don't overlap@footnote{It is possible for a blank line to be
! both the last line of one paragraph and the first line of the next.}.
! Between two paragraphs there may optionally be one or more
! @dfn{separator lines}, which aren't part of any paragraph.  The two
! regular expressions @code{paragraph-separate} and
! @code{paragraph-start} fully determine where paragraphs start and end.
! The beginning and end of the buffer always count as paragraph
! boundaries.
  
  @defvar paragraph-separate
! This regular expression recognizes a separator line by matching any
! portion of it which begins at its left margin (@pxref{Margins for
! Filling}).  (If you change this, you may have to change
! @code{paragraph-start} also.)  The default value is @w{@code{"[@
! \t\f]*$"}}, which matches a line that consists entirely of spaces,
! tabs, and form feeds (after its left margin).
  @end defvar
  
  @defvar paragraph-start
! This regular expression recognizes a line which starts a paragraph
! when the previous line is not a separator.  It need only match some
! portion beginning at the line's left margin (@pxref{Margins for
! Filling}), not the whole line.  It must also be set up to recognize a
! separator line.  The default value is @w{@code{"\f\\|[ \t]*$"}}, which
! matches a line containing only whitespace or starting with a form feed
! (after its left margin).
  @end defvar
  
+ The two variant forms of paragraph breaks are:
+ 
+ @table @asis
+ @item Paragraph break without separator lines
+ Any line, apart from a separator line, which @code{paragraph-start}
+ recognizes starts a new paragraph.
+ 
+ @item Paragraph break with separator lines
+ One or more separator lines split the old paragraph from the new one.
+ Whether @code{paragraph-start} would also recognize the first line of
+ the new paragraph is irrelevant.
+ @end table
+ 
+   As a heuristic feature, if a line tentatively recognized as the
+ start of a paragraph follows a whitespace line, the whitespace line
+ becomes the start of the paragraph instead.
+ 
+   Since the above two regular expressions, @code{paragraph-start} and
+ @code{paragraph-separate}, are matched against text at the left
+ margin, they should @emph{not} use @samp{^} to anchor the match to the
+ beginning of the line.  Most often, the paragraph commands do check
+ for a match only at the beginning of a line, which means that @samp{^}
+ would be superfluous.  When there is a nonzero left margin, they
+ accept matches that start after the left margin.  In that case, a
+ @samp{^} would be incorrect.  However, a @samp{^} is harmless in modes
+ where a left margin is never used.
+ 
+ @cindex sentence
+ @item Sentences
  @defvar sentence-end
  If non-@code{nil}, the value should be a regular expression describing
  the end of a sentence, including the whitespace following the
***************
*** 1700,1705 ****
--- 1743,1749 ----
  @code{sentence-end-without-period} and
  @code{sentence-end-without-space}.
  @end defun
+ @end table
  
  @ignore
     arch-tag: c2573ca2-18aa-4839-93b8-924043ef831f



-- 
Alan Mackenzie (Munich, Germany)

  reply	other threads:[~2005-09-03 12:26 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-08-30 10:50 Beginingless paragraphs Alan Mackenzie
2005-08-30 11:48 ` Benjamin Riefenstahl
2005-08-31 14:36 ` Richard M. Stallman
2005-08-31 17:00   ` Eli Zaretskii
2005-08-31 18:11   ` Alan Mackenzie
2005-09-01 15:53     ` Richard M. Stallman
2005-09-01 17:56       ` Alan Mackenzie
2005-09-01 23:17         ` Thien-Thi Nguyen
2005-09-03  1:42         ` Richard M. Stallman
2005-09-03  1:41     ` Richard M. Stallman
2005-09-03 12:26       ` Alan Mackenzie [this message]
2005-09-04 16:49         ` Richard M. Stallman
2005-09-07 19:17           ` Beginingless paragraphs: second stab at a patch Alan Mackenzie
2005-09-08  9:04             ` Richard M. Stallman
2005-10-19 16:56               ` Clean-up of forward-paragraph [Re: Beginingless paragraphs: second stab at a patch.] Alan Mackenzie
2005-10-20  4:54                 ` Richard M. Stallman
2005-10-20 13:53                   ` Alan Mackenzie
2005-10-21  4:50                     ` Richard M. Stallman
2005-10-21 20:09                       ` Alan Mackenzie
2005-10-22 15:51                         ` Richard M. Stallman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.3.96.1050903121625.302A-100000@acm.acm \
    --to=acm@muc.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.