From: Alan Mackenzie <acm@muc.de>
Subject: Re: Beginingless paragraphs
Date: Sat, 3 Sep 2005 12:26:23 +0000 (GMT) [thread overview]
Message-ID: <Pine.LNX.3.96.1050903121625.302A-100000@acm.acm> (raw)
In-Reply-To: <E1EBN2I-0007I0-Gb@fencepost.gnu.org>
Hi, Emacs!
On Fri, 2 Sep 2005, Richard M. Stallman wrote:
> For example, I was tearing my hair out in frustration a couple of years
> back, trying to get the sentence/paragraph movement and filling stuff to
> work properly in CC Mode.
>If the documentation of paragraph-start and paragraph-separate is not
>clear enough, we can clarify it. I doubt that this would take the form
>of a "definition of paragraphs", though. The reason is that there is no
>simple "definition of paragraphs" at the base of the current code or
>these two variables.
Read my patch and reconsider!
>The concepts that the design is based on are the concepts that you see
>in the manual.
I've worked out just what's been bugging me, and that's the definition of
`paragraph-start': It suggests (though it doesn't quite explicitly say)
that paragraph-start matches the start of _every_ paragraph. This isn't
true - any line following a separator line is the start of a paragraph.
> The four regexps documented on this page all define chunks of
> natural-language text: paragraphs, pages and sentences. So how
> about renaming this @section something like "Sentences, Paragraphs
> and Pages", and making the focus of the @node the definition of
> these things in terms of the regexps, rather than the regexps
> themselves?
>I would be glad to consider a change of this sort.
OK, here's my first shot at a patch: As a matter of interest, what's
this node doing in "Searching and Matching"? Would it not be more at
home under "Text"?
2005-09-03 Alan Mackenzie <acm@muc.de>
* searching.texi (Standard Regexps): Rename the @section "Regular
Expressions for Pages, Paragraphs, and Sentences". Insert a full
description of paragraphs.
*** searching.texi Tue Aug 30 09:15:42 2005
--- searching-1.67.acm.texi Sat Sep 3 12:01:10 2005
***************
*** 1643,1686 ****
@end table
@node Standard Regexps
! @section Standard Regular Expressions Used in Editing
@cindex regexps used standardly in editing
@cindex standard regexps used in editing
! This section describes some variables that hold regular expressions
! used for certain purposes in editing:
!
@defvar page-delimiter
! This is the regular expression describing line-beginnings that separate
! pages. The default value is @code{"^\014"} (i.e., @code{"^^L"} or
! @code{"^\C-l"}); this matches a line that starts with a formfeed
! character.
@end defvar
! The following two regular expressions should @emph{not} assume the
! match always starts at the beginning of a line; they should not use
! @samp{^} to anchor the match. Most often, the paragraph commands do
! check for a match only at the beginning of a line, which means that
! @samp{^} would be superfluous. When there is a nonzero left margin,
! they accept matches that start after the left margin. In that case, a
! @samp{^} would be incorrect. However, a @samp{^} is harmless in modes
! where a left margin is never used.
@defvar paragraph-separate
! This is the regular expression for recognizing the beginning of a line
! that separates paragraphs. (If you change this, you may have to
! change @code{paragraph-start} also.) The default value is
! @w{@code{"[@ \t\f]*$"}}, which matches a line that consists entirely of
! spaces, tabs, and form feeds (after its left margin).
@end defvar
@defvar paragraph-start
! This is the regular expression for recognizing the beginning of a line
! that starts @emph{or} separates paragraphs. The default value is
! @w{@code{"\f\\|[ \t]*$"}}, which matches a line containing only
! whitespace or starting with a form feed (after its left margin).
@end defvar
@defvar sentence-end
If non-@code{nil}, the value should be a regular expression describing
the end of a sentence, including the whitespace following the
--- 1643,1729 ----
@end table
@node Standard Regexps
! @section Regular Expressions for Pages, Paragraphs, and Sentences
@cindex regexps used standardly in editing
@cindex standard regexps used in editing
! This section describes the regular expressions Emacs uses to
! recognize pages, paragraphs, and sentences. By setting these
! variables appropriately, the Elisp programmer can control the precise
! effect of the standard commands that move over, kill, fill, mark,
! narrow to, and otherwise operate on these pieces of text. Note that
! these variables are @emph{not} buffer local by default.
!
! @table @asis
! @cindex page
! @item Pages
@defvar page-delimiter
! This is the regular expression describing line-beginnings that
! separate pages. The default value is @code{"^\014"} (i.e.,
! @code{"^^L"} or @code{"^\C-l"}); this matches a line that starts with
! a formfeed character.
@end defvar
! @cindex paragraph
! @item Paragraphs
! Buffers divide into @dfn{paragraphs}, sequences of whole lines which
! normally don't overlap@footnote{It is possible for a blank line to be
! both the last line of one paragraph and the first line of the next.}.
! Between two paragraphs there may optionally be one or more
! @dfn{separator lines}, which aren't part of any paragraph. The two
! regular expressions @code{paragraph-separate} and
! @code{paragraph-start} fully determine where paragraphs start and end.
! The beginning and end of the buffer always count as paragraph
! boundaries.
@defvar paragraph-separate
! This regular expression recognizes a separator line by matching any
! portion of it which begins at its left margin (@pxref{Margins for
! Filling}). (If you change this, you may have to change
! @code{paragraph-start} also.) The default value is @w{@code{"[@
! \t\f]*$"}}, which matches a line that consists entirely of spaces,
! tabs, and form feeds (after its left margin).
@end defvar
@defvar paragraph-start
! This regular expression recognizes a line which starts a paragraph
! when the previous line is not a separator. It need only match some
! portion beginning at the line's left margin (@pxref{Margins for
! Filling}), not the whole line. It must also be set up to recognize a
! separator line. The default value is @w{@code{"\f\\|[ \t]*$"}}, which
! matches a line containing only whitespace or starting with a form feed
! (after its left margin).
@end defvar
+ The two variant forms of paragraph breaks are:
+
+ @table @asis
+ @item Paragraph break without separator lines
+ Any line, apart from a separator line, which @code{paragraph-start}
+ recognizes starts a new paragraph.
+
+ @item Paragraph break with separator lines
+ One or more separator lines split the old paragraph from the new one.
+ Whether @code{paragraph-start} would also recognize the first line of
+ the new paragraph is irrelevant.
+ @end table
+
+ As a heuristic feature, if a line tentatively recognized as the
+ start of a paragraph follows a whitespace line, the whitespace line
+ becomes the start of the paragraph instead.
+
+ Since the above two regular expressions, @code{paragraph-start} and
+ @code{paragraph-separate}, are matched against text at the left
+ margin, they should @emph{not} use @samp{^} to anchor the match to the
+ beginning of the line. Most often, the paragraph commands do check
+ for a match only at the beginning of a line, which means that @samp{^}
+ would be superfluous. When there is a nonzero left margin, they
+ accept matches that start after the left margin. In that case, a
+ @samp{^} would be incorrect. However, a @samp{^} is harmless in modes
+ where a left margin is never used.
+
+ @cindex sentence
+ @item Sentences
@defvar sentence-end
If non-@code{nil}, the value should be a regular expression describing
the end of a sentence, including the whitespace following the
***************
*** 1700,1705 ****
--- 1743,1749 ----
@code{sentence-end-without-period} and
@code{sentence-end-without-space}.
@end defun
+ @end table
@ignore
arch-tag: c2573ca2-18aa-4839-93b8-924043ef831f
--
Alan Mackenzie (Munich, Germany)
next prev parent reply other threads:[~2005-09-03 12:26 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-08-30 10:50 Beginingless paragraphs Alan Mackenzie
2005-08-30 11:48 ` Benjamin Riefenstahl
2005-08-31 14:36 ` Richard M. Stallman
2005-08-31 17:00 ` Eli Zaretskii
2005-08-31 18:11 ` Alan Mackenzie
2005-09-01 15:53 ` Richard M. Stallman
2005-09-01 17:56 ` Alan Mackenzie
2005-09-01 23:17 ` Thien-Thi Nguyen
2005-09-03 1:42 ` Richard M. Stallman
2005-09-03 1:41 ` Richard M. Stallman
2005-09-03 12:26 ` Alan Mackenzie [this message]
2005-09-04 16:49 ` Richard M. Stallman
2005-09-07 19:17 ` Beginingless paragraphs: second stab at a patch Alan Mackenzie
2005-09-08 9:04 ` Richard M. Stallman
2005-10-19 16:56 ` Clean-up of forward-paragraph [Re: Beginingless paragraphs: second stab at a patch.] Alan Mackenzie
2005-10-20 4:54 ` Richard M. Stallman
2005-10-20 13:53 ` Alan Mackenzie
2005-10-21 4:50 ` Richard M. Stallman
2005-10-21 20:09 ` Alan Mackenzie
2005-10-22 15:51 ` Richard M. Stallman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.3.96.1050903121625.302A-100000@acm.acm \
--to=acm@muc.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.