* Beginingless paragraphs @ 2005-08-30 10:50 Alan Mackenzie 2005-08-30 11:48 ` Benjamin Riefenstahl 2005-08-31 14:36 ` Richard M. Stallman 0 siblings, 2 replies; 20+ messages in thread From: Alan Mackenzie @ 2005-08-30 10:50 UTC (permalink / raw) Hi, Emacs! What is a "paragraph" in Emacs? I can't find a @dfn{paragraph} anywhere in the Emacs/Elisp manuals. I don't have the full CVS of Lispref, but grepping in the released version didn't produce any hits. There are definitions of the two paragraph regexps in the Elisp Manual: - Variable: paragraph-separate This is the regular expression for recognizing the beginning of a line that separates paragraphs. (If you change this, you may have to change `paragraph-start' also.) The default value is `"[ \t\f]*$"', which matches a line that consists entirely of spaces, tabs, and form feeds (after its left margin). - Variable: paragraph-start This is the regular expression for recognizing the beginning of a line that starts _or_ separates paragraphs. The default value is `"[ \t\n\f]"', which matches a line starting with a space, tab, newline, or form feed (after its left margin). . Then there is this evasive "definition" of paragraph in the Emacs manual (taken from text.texi V1.57): The precise definition of a paragraph boundary is controlled by the variables `paragraph-separate' and `paragraph-start'. The value of `paragraph-start' is a regexp that should match any line that either starts or separates paragraphs. The value of `paragraph-separate' is another regexp that should match only lines that separate paragraphs without being part of any paragraph (for example, blank lines). Lines that start a new paragraph and are contained in it must match only `paragraph-start', not `paragraph-separate'. Each regular expression must match at the left margin. For example, in Fundamental mode, `paragraph-start' is `"\f\\|[ \t]*$"', and `paragraph-separate' is `"[ \t\f]*$"'. I don't really want to know what _controls_ the definition of a paragraph boundary. I want that definition itself without having to resort to a kind of contorted reverse logic to get to it. And believe me, working through that logic is hard. Doing that gives this: A paragraph ends just before a line which matches paragraph-start (?or at EOB). A paragraph starts at a line which matches p-start, but _doesn't_ match p-separate (?or at BOB). What happens if these two regexps are the same (as they are by default in Text mode)? There cannot be any lines which start paragraphs, only lines which separate them. Beginningless paragraphs! This is absurd. Incidentally, what happens when a line matches p-start but not p-separate? Shouldn't ever happen, but it surely will. At this point, the Elisp programmer, reduced to tears, starts reading the source code to find out what a paragraph really is. Non-programmers (and there will be a fair number of them using Text mode) need to start experimenting with regexps. SURELY there should be a proper @dfn{paragraph} in the Emacs manual? I'm too confused to write one myself at the moment. "Like a circle in a spiral, Like a wheel within a wheel. Never starting, only ending On an ever-spinning reel. As the images unwind Like the paragraphs you find In the buffers of your mind." -- Alan Mackenzie (Munich, Germany) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Beginingless paragraphs 2005-08-30 10:50 Beginingless paragraphs Alan Mackenzie @ 2005-08-30 11:48 ` Benjamin Riefenstahl 2005-08-31 14:36 ` Richard M. Stallman 1 sibling, 0 replies; 20+ messages in thread From: Benjamin Riefenstahl @ 2005-08-30 11:48 UTC (permalink / raw) Cc: emacs-devel Hi Alan, Alan Mackenzie writes: > What is a "paragraph" in Emacs? The info node that you quoted is (info "(emacs)Paragraphs") I think? It has this text (this is a rather old CVS version, so it may be different by now): [...] Blank lines and text-formatter command lines separate paragraphs and are not considered part of any paragraph. [...] [...] In major modes for programs, paragraphs begin and end only at blank lines. This makes the paragraph commands continue to be useful even though there are no paragraphs per se. [...] Technically the term "paragraph" is defined not by some global definition but by the individual modes via the variables that you have cited. So every mode can have it's own idea of what a paragraph is. E.g. the buffer that I'm currently writing in to compose this message, is in GNUS' "Message" mode. It tries to accomodate the marker that separates the headers from the message body and also attribution lines like the above "Alan Mackenzie writes:". benny ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Beginingless paragraphs 2005-08-30 10:50 Beginingless paragraphs Alan Mackenzie 2005-08-30 11:48 ` Benjamin Riefenstahl @ 2005-08-31 14:36 ` Richard M. Stallman 2005-08-31 17:00 ` Eli Zaretskii 2005-08-31 18:11 ` Alan Mackenzie 1 sibling, 2 replies; 20+ messages in thread From: Richard M. Stallman @ 2005-08-31 14:36 UTC (permalink / raw) Cc: emacs-devel What is a "paragraph" in Emacs? I can't find a @dfn{paragraph} anywhere in the Emacs/Elisp manuals. Do we need one? Certainly in the Emacs Lisp manual we don't. It is a high-level concept used in just a few user-level commands. Defining paragraphs better might be useful in the Emacs Manual. Would this really make a difference for users, though? I am not sure. Right now the manual effectively takes for granted that users know what a paragraph is. Is this a problem? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Beginingless paragraphs 2005-08-31 14:36 ` Richard M. Stallman @ 2005-08-31 17:00 ` Eli Zaretskii 2005-08-31 18:11 ` Alan Mackenzie 1 sibling, 0 replies; 20+ messages in thread From: Eli Zaretskii @ 2005-08-31 17:00 UTC (permalink / raw) Cc: acm, emacs-devel > From: "Richard M. Stallman" <rms@gnu.org> > Date: Wed, 31 Aug 2005 10:36:53 -0400 > Cc: emacs-devel@gnu.org > > Right now the manual effectively takes for granted that users > know what a paragraph is. Is this a problem? IMHO, we only need to define what's a paragraph if its Emacs definitions differs significantly from what a naive user will expect. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Beginingless paragraphs 2005-08-31 14:36 ` Richard M. Stallman 2005-08-31 17:00 ` Eli Zaretskii @ 2005-08-31 18:11 ` Alan Mackenzie 2005-09-01 15:53 ` Richard M. Stallman 2005-09-03 1:41 ` Richard M. Stallman 1 sibling, 2 replies; 20+ messages in thread From: Alan Mackenzie @ 2005-08-31 18:11 UTC (permalink / raw) Cc: emacs-devel Hi, Emacs! On Wed, 31 Aug 2005, Richard M. Stallman wrote: > What is a "paragraph" in Emacs? I can't find a @dfn{paragraph} > anywhere in the Emacs/Elisp manuals. >Do we need one? Certainly in the Emacs Lisp manual we don't. >It is a high-level concept used in just a few user-level commands. I disagree most strongly here. Hackers need to know how to set up paragraph-s\(tart\|eparate\) so that they can make the canonical paragraph commands behave the way they want them to. For example, I was tearing my hair out in frustration a couple of years back, trying to get the sentence/paragraph movement and filling stuff to work properly in CC Mode. (Comment prefixes and escaped newlines in strings complicate things.) In the end, I had to edebugger my way through forward-paragraph to get a handle on things. For another example, it would be nice if M-q in a Shell Script mode comment would regard a line like "# " as a blank line (i.e. paragraph separator). (Maybe this has already been done for 22.1 - if so, apologies.) This would be easier, I think, with a clear definition of a paragraph. I think it would be far easier to understand a description like "The paragraph functions recognise the start of a paragraph as <expression containing p-start and p-separate> and the end of a paragraph as <another such expression>." than the ones saying "p-start matches ...." and "p-separate matches ...". >Defining paragraphs better might be useful in the Emacs Manual. I think they should either be properly defined there or the references to paragraph-s\(tart\|eparate\) replaced by an xref to a description in the Elisp manual. The logical absurdity ("beginningless paragraphs") implicit in the manual surely should be sorted out one way or the other. >Would this really make a difference for users, though? I am not sure. >Right now the manual effectively takes for granted that users know what >a paragraph is. Is this a problem? I think Users will know exactly what a paragraph is until they've seen the descriptions of paragraph-start and paragraph-separate, after which they'll not be so sure any more. ;-) ######################################################################### Looking at @node "Standard Regexps" in Elisp's searching.texi, perhaps the node's title should be amended. What does "Standard Regular Expressions used in Editing" really say? Well, "Standard" and "used in Editing" both seem content-free, so all the title really means is "A few regexps". The four regexps documented on this page all define chunks of natural-language text: paragraphs, pages and sentences. So how about renaming this @section something like "Sentences, Paragraphs and Pages", and making the focus of the @node the definition of these things in terms of the regexps, rather than the regexps themselves? It goes without saying, I'm ready and willing to make these amendments myself. -- Alan Mackenzie (Munich, Germany) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Beginingless paragraphs 2005-08-31 18:11 ` Alan Mackenzie @ 2005-09-01 15:53 ` Richard M. Stallman 2005-09-01 17:56 ` Alan Mackenzie 2005-09-03 1:41 ` Richard M. Stallman 1 sibling, 1 reply; 20+ messages in thread From: Richard M. Stallman @ 2005-09-01 15:53 UTC (permalink / raw) Cc: emacs-devel >Do we need one? Certainly in the Emacs Lisp manual we don't. >It is a high-level concept used in just a few user-level commands. I disagree most strongly here. Hackers need to know how to set up paragraph-s\(tart\|eparate\) so that they can make the canonical paragraph commands behave the way they want them to. This is precisely what the Emacs Manual documents now. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Beginingless paragraphs 2005-09-01 15:53 ` Richard M. Stallman @ 2005-09-01 17:56 ` Alan Mackenzie 2005-09-01 23:17 ` Thien-Thi Nguyen 2005-09-03 1:42 ` Richard M. Stallman 0 siblings, 2 replies; 20+ messages in thread From: Alan Mackenzie @ 2005-09-01 17:56 UTC (permalink / raw) Hi, Emacs! On Thu, 1 Sep 2005, Richard M. Stallman wrote: > >Do we need one? Certainly in the Emacs Lisp manual we don't. > >It is a high-level concept used in just a few user-level commands. > I disagree most strongly here. Hackers need to know how to set up > paragraph-s\(tart\|eparate\) so that they can make the canonical > paragraph commands behave the way they want them to. >This is precisely what the Emacs Manual documents now. I think "precisely" is an overstatement. I'd find it helpful here if other people would step in and say how they find the existing documentation of paragraphs, paragraph-start and paragraph-separate. If others also find it confusing and obscure, then it should be improved, and I'm willing to do the improving. If, on the other hand, others can read and understand it without too much difficulty, then it's my personal problem which I'll have to deal with myself in private. -- Alan Mackenzie (Munich, Germany) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Beginingless paragraphs 2005-09-01 17:56 ` Alan Mackenzie @ 2005-09-01 23:17 ` Thien-Thi Nguyen 2005-09-03 1:42 ` Richard M. Stallman 1 sibling, 0 replies; 20+ messages in thread From: Thien-Thi Nguyen @ 2005-09-01 23:17 UTC (permalink / raw) Cc: emacs-devel Alan Mackenzie <acm@muc.de> writes: > >This is precisely what the Emacs Manual documents now. > > I think "precisely" is an overstatement. > > I'd find it helpful here if other people would step in and say how > they find the existing documentation of paragraphs, paragraph-start and > paragraph-separate. fwiw, i found docs w/ "grep". first in lispref/*.texi, then in man/*.texi. the docs i found were adequate enough to start experimenting in an emacs session. probably if i were reading them outside of emacs (hypothetical situation :-), they would not be "complete" and thus, depending on my mood, i might find them adequate or not. thi ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Beginingless paragraphs 2005-09-01 17:56 ` Alan Mackenzie 2005-09-01 23:17 ` Thien-Thi Nguyen @ 2005-09-03 1:42 ` Richard M. Stallman 1 sibling, 0 replies; 20+ messages in thread From: Richard M. Stallman @ 2005-09-03 1:42 UTC (permalink / raw) Cc: emacs-devel > I disagree most strongly here. Hackers need to know how to set up > paragraph-s\(tart\|eparate\) so that they can make the canonical > paragraph commands behave the way they want them to. >This is precisely what the Emacs Manual documents now. I think "precisely" is an overstatement. No it isn't. The topic that the existing text documents is precisely the topic that you asked for. If you suggest making the text clearer, that may be desirable, but it is a different issue. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Beginingless paragraphs 2005-08-31 18:11 ` Alan Mackenzie 2005-09-01 15:53 ` Richard M. Stallman @ 2005-09-03 1:41 ` Richard M. Stallman 2005-09-03 12:26 ` Alan Mackenzie 1 sibling, 1 reply; 20+ messages in thread From: Richard M. Stallman @ 2005-09-03 1:41 UTC (permalink / raw) Cc: emacs-devel For example, I was tearing my hair out in frustration a couple of years back, trying to get the sentence/paragraph movement and filling stuff to work properly in CC Mode. If the documentation of paragraph-start and paragraph-separate is not clear enough, we can clarify it. I doubt that this would take the form of a "definition of paragraphs", though. The reason is that there is no simple "definition of paragraphs" at the base of the current code or these two variables. The concepts that the design is based on are the concepts that you see in the manual. The four regexps documented on this page all define chunks of natural-language text: paragraphs, pages and sentences. So how about renaming this @section something like "Sentences, Paragraphs and Pages", and making the focus of the @node the definition of these things in terms of the regexps, rather than the regexps themselves? I would be glad to consider a change of this sort. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Beginingless paragraphs 2005-09-03 1:41 ` Richard M. Stallman @ 2005-09-03 12:26 ` Alan Mackenzie 2005-09-04 16:49 ` Richard M. Stallman 0 siblings, 1 reply; 20+ messages in thread From: Alan Mackenzie @ 2005-09-03 12:26 UTC (permalink / raw) Hi, Emacs! On Fri, 2 Sep 2005, Richard M. Stallman wrote: > For example, I was tearing my hair out in frustration a couple of years > back, trying to get the sentence/paragraph movement and filling stuff to > work properly in CC Mode. >If the documentation of paragraph-start and paragraph-separate is not >clear enough, we can clarify it. I doubt that this would take the form >of a "definition of paragraphs", though. The reason is that there is no >simple "definition of paragraphs" at the base of the current code or >these two variables. Read my patch and reconsider! >The concepts that the design is based on are the concepts that you see >in the manual. I've worked out just what's been bugging me, and that's the definition of `paragraph-start': It suggests (though it doesn't quite explicitly say) that paragraph-start matches the start of _every_ paragraph. This isn't true - any line following a separator line is the start of a paragraph. > The four regexps documented on this page all define chunks of > natural-language text: paragraphs, pages and sentences. So how > about renaming this @section something like "Sentences, Paragraphs > and Pages", and making the focus of the @node the definition of > these things in terms of the regexps, rather than the regexps > themselves? >I would be glad to consider a change of this sort. OK, here's my first shot at a patch: As a matter of interest, what's this node doing in "Searching and Matching"? Would it not be more at home under "Text"? 2005-09-03 Alan Mackenzie <acm@muc.de> * searching.texi (Standard Regexps): Rename the @section "Regular Expressions for Pages, Paragraphs, and Sentences". Insert a full description of paragraphs. *** searching.texi Tue Aug 30 09:15:42 2005 --- searching-1.67.acm.texi Sat Sep 3 12:01:10 2005 *************** *** 1643,1686 **** @end table @node Standard Regexps ! @section Standard Regular Expressions Used in Editing @cindex regexps used standardly in editing @cindex standard regexps used in editing ! This section describes some variables that hold regular expressions ! used for certain purposes in editing: ! @defvar page-delimiter ! This is the regular expression describing line-beginnings that separate ! pages. The default value is @code{"^\014"} (i.e., @code{"^^L"} or ! @code{"^\C-l"}); this matches a line that starts with a formfeed ! character. @end defvar ! The following two regular expressions should @emph{not} assume the ! match always starts at the beginning of a line; they should not use ! @samp{^} to anchor the match. Most often, the paragraph commands do ! check for a match only at the beginning of a line, which means that ! @samp{^} would be superfluous. When there is a nonzero left margin, ! they accept matches that start after the left margin. In that case, a ! @samp{^} would be incorrect. However, a @samp{^} is harmless in modes ! where a left margin is never used. @defvar paragraph-separate ! This is the regular expression for recognizing the beginning of a line ! that separates paragraphs. (If you change this, you may have to ! change @code{paragraph-start} also.) The default value is ! @w{@code{"[@ \t\f]*$"}}, which matches a line that consists entirely of ! spaces, tabs, and form feeds (after its left margin). @end defvar @defvar paragraph-start ! This is the regular expression for recognizing the beginning of a line ! that starts @emph{or} separates paragraphs. The default value is ! @w{@code{"\f\\|[ \t]*$"}}, which matches a line containing only ! whitespace or starting with a form feed (after its left margin). @end defvar @defvar sentence-end If non-@code{nil}, the value should be a regular expression describing the end of a sentence, including the whitespace following the --- 1643,1729 ---- @end table @node Standard Regexps ! @section Regular Expressions for Pages, Paragraphs, and Sentences @cindex regexps used standardly in editing @cindex standard regexps used in editing ! This section describes the regular expressions Emacs uses to ! recognize pages, paragraphs, and sentences. By setting these ! variables appropriately, the Elisp programmer can control the precise ! effect of the standard commands that move over, kill, fill, mark, ! narrow to, and otherwise operate on these pieces of text. Note that ! these variables are @emph{not} buffer local by default. ! ! @table @asis ! @cindex page ! @item Pages @defvar page-delimiter ! This is the regular expression describing line-beginnings that ! separate pages. The default value is @code{"^\014"} (i.e., ! @code{"^^L"} or @code{"^\C-l"}); this matches a line that starts with ! a formfeed character. @end defvar ! @cindex paragraph ! @item Paragraphs ! Buffers divide into @dfn{paragraphs}, sequences of whole lines which ! normally don't overlap@footnote{It is possible for a blank line to be ! both the last line of one paragraph and the first line of the next.}. ! Between two paragraphs there may optionally be one or more ! @dfn{separator lines}, which aren't part of any paragraph. The two ! regular expressions @code{paragraph-separate} and ! @code{paragraph-start} fully determine where paragraphs start and end. ! The beginning and end of the buffer always count as paragraph ! boundaries. @defvar paragraph-separate ! This regular expression recognizes a separator line by matching any ! portion of it which begins at its left margin (@pxref{Margins for ! Filling}). (If you change this, you may have to change ! @code{paragraph-start} also.) The default value is @w{@code{"[@ ! \t\f]*$"}}, which matches a line that consists entirely of spaces, ! tabs, and form feeds (after its left margin). @end defvar @defvar paragraph-start ! This regular expression recognizes a line which starts a paragraph ! when the previous line is not a separator. It need only match some ! portion beginning at the line's left margin (@pxref{Margins for ! Filling}), not the whole line. It must also be set up to recognize a ! separator line. The default value is @w{@code{"\f\\|[ \t]*$"}}, which ! matches a line containing only whitespace or starting with a form feed ! (after its left margin). @end defvar + The two variant forms of paragraph breaks are: + + @table @asis + @item Paragraph break without separator lines + Any line, apart from a separator line, which @code{paragraph-start} + recognizes starts a new paragraph. + + @item Paragraph break with separator lines + One or more separator lines split the old paragraph from the new one. + Whether @code{paragraph-start} would also recognize the first line of + the new paragraph is irrelevant. + @end table + + As a heuristic feature, if a line tentatively recognized as the + start of a paragraph follows a whitespace line, the whitespace line + becomes the start of the paragraph instead. + + Since the above two regular expressions, @code{paragraph-start} and + @code{paragraph-separate}, are matched against text at the left + margin, they should @emph{not} use @samp{^} to anchor the match to the + beginning of the line. Most often, the paragraph commands do check + for a match only at the beginning of a line, which means that @samp{^} + would be superfluous. When there is a nonzero left margin, they + accept matches that start after the left margin. In that case, a + @samp{^} would be incorrect. However, a @samp{^} is harmless in modes + where a left margin is never used. + + @cindex sentence + @item Sentences @defvar sentence-end If non-@code{nil}, the value should be a regular expression describing the end of a sentence, including the whitespace following the *************** *** 1700,1705 **** --- 1743,1749 ---- @code{sentence-end-without-period} and @code{sentence-end-without-space}. @end defun + @end table @ignore arch-tag: c2573ca2-18aa-4839-93b8-924043ef831f -- Alan Mackenzie (Munich, Germany) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Beginingless paragraphs 2005-09-03 12:26 ` Alan Mackenzie @ 2005-09-04 16:49 ` Richard M. Stallman 2005-09-07 19:17 ` Beginingless paragraphs: second stab at a patch Alan Mackenzie 0 siblings, 1 reply; 20+ messages in thread From: Richard M. Stallman @ 2005-09-04 16:49 UTC (permalink / raw) Cc: emacs-devel Overall, the change is good, but there are many details that should be done differently. I've worked out just what's been bugging me, and that's the definition of `paragraph-start': It suggests (though it doesn't quite explicitly say) that paragraph-start matches the start of _every_ paragraph. This isn't true - any line following a separator line is the start of a paragraph. That is true, but paragraph-start does have a match at or just before the start of every paragraph. THat's because it is supposed to match separator lines, too. OK, here's my first shot at a patch: As a matter of interest, what's this node doing in "Searching and Matching"? Because that's where regexps are. ! This section describes the regular expressions Emacs uses to ! recognize pages, paragraphs, and sentences. By setting these ! variables appropriately, the Elisp programmer can control the precise Please write "Emacs Lisp". ! @table @asis ! @cindex page ! @item Pages @defvar page-delimiter ! This is the regular expression describing line-beginnings that ! separate pages. The default value is @code{"^\014"} (i.e., Using @defvar inside of @table is a peculiar thing to do. It may look bad in TeX or in Makeinfo. ! This is the regular expression describing line-beginnings that "Describing" is vague; what it does is match them. ! Buffers divide into @dfn{paragraphs}, That is a strange way to put it. It sounds like you're saying that buffers actually split up. It would be better to make this parallel to the info about pages. ! normally don't overlap@footnote{It is possible for a blank line to be ! both the last line of one paragraph and the first line of the next.}. Are you sure? I don't think so. A blank line would normally be a separator line, not the first or last line of any paragraph. ! This regular expression recognizes a line which starts a paragraph ! when the previous line is not a separator. It need only match some ! portion beginning at the line's left margin (@pxref{Margins for ! Filling}), not the whole line. It must also be set up to recognize a ! separator line. I think this way of putting it is less clear than the the current way: that it should match lines that either start or separate paragraphs. + The two variant forms of paragraph breaks are: + + @table @asis + @item Paragraph break without separator lines + Any line, apart from a separator line, which @code{paragraph-start} + recognizes starts a new paragraph. + + @item Paragraph break with separator lines + One or more separator lines split the old paragraph from the new one. + Whether @code{paragraph-start} would also recognize the first line of + the new paragraph is irrelevant. + @end table Itemizing these two is a good idea (but you should use @itemize, not @table). However, calling them "variant forms" is confusing. I suggest calling them "two ways that paragraphs can be separated". Also I suggest swapping them, because the first one is the usual case and the simplest case to understand. + + As a heuristic feature, The phrase "heuristic feature" does not make sense to me. if a line tentatively recognized as the + start of a paragraph follows a whitespace line, the whitespace line + becomes the start of the paragraph instead. That is a confusing way to put it. It's clearer to say "is included in the paragraph" than "becomes the start of the paragraph". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Beginingless paragraphs: second stab at a patch. 2005-09-04 16:49 ` Richard M. Stallman @ 2005-09-07 19:17 ` Alan Mackenzie 2005-09-08 9:04 ` Richard M. Stallman 0 siblings, 1 reply; 20+ messages in thread From: Alan Mackenzie @ 2005-09-07 19:17 UTC (permalink / raw) Hi, Richard! Here is mark II of my patch to searching.texi, incorporating most of the changes you suggested. It isn't yet finished - I haven't made any amendments to the "sentence" bits - so I haven't included a ChangeLog entry. I'd appreciate further criticism on it. On Sun, 4 Sep 2005, Richard M. Stallman wrote: [ .... ] >Using @defvar inside of @table is a peculiar thing to do. It may look >bad in TeX or in Makeinfo. I really wanted @subheading, which I've since found in the Texinfo manual and now put into the text. > ! This is the regular expression describing line-beginnings that >"Describing" is vague; what it does is match them. I hadn't actually touched the bit about pages. I have now! > ! Buffers divide into @dfn{paragraphs}, > >That is a strange way to put it. It sounds like you're saying that >buffers actually split up. It would be better to make this >parallel to the info about pages. I was trying to suggest (i) that _all_ buffers have paragraphs, not just "special" buffers, for whatever value of special; (ii) The set of paragraphs in a buffer together with the separator lines COVER a buffer; it is not the case that a buffer might have an isolated paragraph hiding away somewhere inside it. (iii) Also, I was trying to avoid using the passive voice. I solved (i) by saying explicitly at the top that all buffers have p, p, and s. (ii)+(iii) are more difficult. In the version of the patch I'm submitting with this email, I've left a passive in. I can't find a way of expressing it which reads well and avoids passives. Suggestions would be welcome. > ! normally don't overlap@footnote{It is possible for a blank line to be > ! both the last line of one paragraph and the first line of the next.}. >Are you sure? I don't think so. A blank line would normally >be a separator line, not the first or last line of any paragraph. Try out this file: ------------------------------------------------------ 1st Line [starter] asdf 1st Line [starter] asdf - Local Variables: paragraph-separate: "-" paragraph-start: "1st Line\\|-" End: ----------------------------------------------------- Do M-h on each of the lines "asdf". The blank line is included in both paragraphs. This happens because the blank line isn't a separator here. It is an ordinary line of the upper paragraph and the "heuristic" (sorry about the word) blank line tacked on to the paragraph below. Not something to lose too much sleep about, perhaps. I've toned down the bit about it in the patch. Here is the patch: *** searching-1.67.texi Tue Aug 30 09:15:42 2005 --- searching-1.67.acm.texi Wed Sep 7 16:49:38 2005 *************** *** 1643,1685 **** @end table @node Standard Regexps ! @section Standard Regular Expressions Used in Editing @cindex regexps used standardly in editing @cindex standard regexps used in editing ! This section describes some variables that hold regular expressions ! used for certain purposes in editing: @defvar page-delimiter ! This is the regular expression describing line-beginnings that separate ! pages. The default value is @code{"^\014"} (i.e., @code{"^^L"} or ! @code{"^\C-l"}); this matches a line that starts with a formfeed ! character. @end defvar ! The following two regular expressions should @emph{not} assume the ! match always starts at the beginning of a line; they should not use ! @samp{^} to anchor the match. Most often, the paragraph commands do ! check for a match only at the beginning of a line, which means that ! @samp{^} would be superfluous. When there is a nonzero left margin, ! they accept matches that start after the left margin. In that case, a ! @samp{^} would be incorrect. However, a @samp{^} is harmless in modes ! where a left margin is never used. @defvar paragraph-separate ! This is the regular expression for recognizing the beginning of a line ! that separates paragraphs. (If you change this, you may have to ! change @code{paragraph-start} also.) The default value is ! @w{@code{"[@ \t\f]*$"}}, which matches a line that consists entirely of ! spaces, tabs, and form feeds (after its left margin). @end defvar @defvar paragraph-start ! This is the regular expression for recognizing the beginning of a line ! that starts @emph{or} separates paragraphs. The default value is ! @w{@code{"\f\\|[ \t]*$"}}, which matches a line containing only ! whitespace or starting with a form feed (after its left margin). @end defvar @defvar sentence-end If non-@code{nil}, the value should be a regular expression describing --- 1643,1750 ---- @end table @node Standard Regexps ! @section Regular Expressions for Pages, Paragraphs, and Sentences @cindex regexps used standardly in editing @cindex standard regexps used in editing ! This section specifies precisely what pages, paragraphs, and ! sentences are in Emacs and the regular expressions it uses to ! recognize them. By setting these variables appropriately, the Emacs ! Lisp programmer can control the precise effect of the standard ! commands that move over, kill, fill, mark, narrow to, and otherwise ! operate on these pieces of text. Note that these variables are ! @emph{not} buffer local by default. ! ! Although the notions of pages, paragraphs, and sentences are mostly ! useful in modes for natural language text, the commands which use ! these textual units work in @emph{all} buffers. ! ! @cindex page ! @subheading Pages ! ! A @dfn{page} in an Emacs buffer is an expanse of text extending from ! just after a @dfn{page delimiter} to just after the next one---a page ! delimiter is part of the page it terminates. A page delimiter is an ! arbitrarily defined sequence of text which starts at column zero and ! may extend over several lines. By default it is a single formfeed at ! column zero. The beginning and end of the buffer also count as page ! boundaries. @defvar page-delimiter ! This is the regular expression that matches a page delimiter. It ! should be anchored to the beginning of the line (i.e. it should start ! with @samp{^}). The default value is @code{"^\014"} (i.e., ! @code{"^^L"} or @code{"^\C-l"}). @end defvar ! @cindex paragraph ! @subheading Paragraphs ! Buffers in Emacs can be viewed as consisting of @dfn{Paragraphs}, ! certain sequences of whole lines. The two regular expressions ! @code{paragraph-separate} and @code{paragraph-start} determine where ! they start and end. Paragraphs don't overlap@footnote{In certain ! obscure circumstances it is possible for a blank line to be both the ! last line of one paragraph and the first line of the next.}. Between ! two paragraphs there are often one or more @dfn{separator lines}, ! which aren't part of any paragraph. The beginning and end of the ! buffer always count as paragraph boundaries. ! ! The two ways that paragraphs can be separated are: ! ! @itemize @bullet ! @item ! With separator lines---one or more separator lines split the old ! paragraph from the new one. Whether @code{paragraph-start} would also ! recognize the first line of the new paragraph is irrelevant. ! ! @item ! Without separator lines---any line, apart from a separator line, which ! @code{paragraph-start} recognizes starts a new paragraph. This might ! be an indented line, for example. ! @end itemize @defvar paragraph-separate ! This regular expression recognizes a separator line by matching any ! portion of it which begins at its left margin (@pxref{Margins}). (If ! you change this, you may have to change @code{paragraph-start} also.) ! The default value is @w{@code{"[@ \t\f]*$"}}, which matches a line ! that consists entirely of spaces, tabs, and form feeds (after its left ! margin). @end defvar @defvar paragraph-start ! This regular expression recognizes @emph{either} a line which starts a ! paragraph when the previous line is not a separator @emph{or} a ! separator line. It need only match some portion beginning at the ! line's left margin (@pxref{Margins}), not the whole line. The default ! value is @w{@code{"\f\\|[ \t]*$"}}, which matches a line containing ! only whitespace or starting with a form feed (after its left margin). @end defvar + + Additionally, if a line tentatively recognized as the start of a + paragraph follows a whitespace line, the whitespace line is included + in the paragraph. + + The usual values of @code{paragraph-separate} and + @code{paragraph-start} contain @samp{\f} (a formfeed) and thus + constrain paragraphs (and hence sentences) to end at a page boundary. + This works well for the way page separators are mostly used in Emacs. + If you want paragraphs to straddle page boundaries, like they do in + printed books, set these variables to, say, @w{@code{"[@ \t]*$"}} and + @w{@code{"[@ \t]*$"}}. + + Since the above two regular expressions, @code{paragraph-start} and + @code{paragraph-separate}, are matched against text at the left + margin, they should @emph{not} use @samp{^} to anchor the match to the + beginning of the line. Most often, the paragraph commands do check + for a match only at the beginning of a line, which means that @samp{^} + would be superfluous. When there is a nonzero left margin, they + accept matches that start after the left margin. In that case, a + @samp{^} would be incorrect. However, a @samp{^} is harmless in modes + where a left margin is never used. + + @cindex sentence + @subheading Sentences @defvar sentence-end If non-@code{nil}, the value should be a regular expression describing -- Alan Mackenzie (Munich, Germany) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Beginingless paragraphs: second stab at a patch. 2005-09-07 19:17 ` Beginingless paragraphs: second stab at a patch Alan Mackenzie @ 2005-09-08 9:04 ` Richard M. Stallman 2005-10-19 16:56 ` Clean-up of forward-paragraph [Re: Beginingless paragraphs: second stab at a patch.] Alan Mackenzie 0 siblings, 1 reply; 20+ messages in thread From: Richard M. Stallman @ 2005-09-08 9:04 UTC (permalink / raw) Cc: emacs-devel Do M-h on each of the lines "asdf". The blank line is included in both paragraphs. This happens because the blank line isn't a separator here. It is an ordinary line of the upper paragraph and the "heuristic" (sorry about the word) blank line tacked on to the paragraph below. I think this is a bug. The blank line should not be included in both paragraphs, only in one of them. If something special is to be done to include the blank line in the following paragraph, then something special should also be done to exclude it from the preceding paragraph. Would you like to implement that? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Clean-up of forward-paragraph [Re: Beginingless paragraphs: second stab at a patch.] 2005-09-08 9:04 ` Richard M. Stallman @ 2005-10-19 16:56 ` Alan Mackenzie 2005-10-20 4:54 ` Richard M. Stallman 0 siblings, 1 reply; 20+ messages in thread From: Alan Mackenzie @ 2005-10-19 16:56 UTC (permalink / raw) Hi, Richard, Hi Emacs! On Thu, 8 Sep 2005, Richard M. Stallman wrote: > Do M-h on each of the lines "asdf". The blank line is included in both > paragraphs. This happens because the blank line isn't a separator here. > It is an ordinary line of the upper paragraph and the "heuristic" (sorry > about the word) blank line tacked on to the paragraph below. >I think this is a bug. The blank line should not be included in both >paragraphs, only in one of them. If something special is to be done >to include the blank line in the following paragraph, then something >special should also be done to exclude it from the preceding paragraph. >Would you like to implement that? I've been having a look at this. There are actually several other bugs in forward-paragraph too - for example: (i) When there is a left margin, sometimes forward paragraph moves to column zero, sometimes to after the margin. I think it should always move to after the margin if it can. (It can't when the line doesn't contain enough whitespace to constitute the margin.) (ii) When `use-hard-newlines' is non-nil (i.e. Longlines Mode is enabled), forward-paragraph can spuriously recognise a line "in the middle of a paragraph" as a separator line when it "looks like" one. (This only shows itself with a non-blank separator line.) (iii) When there is a non-nil fill-prefix, f-p uses it in place of paragraph-start when searching backwards, but not when searching forwards. Half of these cases is a bug. ######################################################################### Incidentally: The Elisp manual is a bit unclear on the page "Paragraphs", where it says: When there is a fill prefix, then paragraphs are delimited by all lines which don't start with the fill prefix. This doesn't make clear (? clear enough) whether this scheme of terminating paragraphs is in addition to or in place of paragraph-s\(tart\|eparate\). I think one of the words "also" and "instead" should be inserted before "delimited". Which one? ######################################################################### As a matter of interest, I think there are perilously many complications in forward-paragraph. There's left margins, fill-prefixes, hard newlines (Longlines Mode), and paragraph-s\(tart\|eparate\). Each of these interacts with every other in some way, mostly in ways that have never been formally specified. For example, how does Emacs behave when there's a left margin and Longlines Mode is enabled? (I'll try this out myself later). AAMOI (2): Having told people precisely how to set up paragraph-s\(tart\|eparate\), why does forward-paragraph then go to such lengths to correct incorrect settings: It removes spurious leading "^"s from these variables, and it combine the two into a regexp for searching, even though p-start should match anything that p-separate does. Is this really the Right Thing to do? Would it not be better, having checked the variables, to nag their developer with an output to *Messages*? The current method will conceal bugs rather than exposing them. ######################################################################### As a first step to fixing the bugs in forward-paragraph, I've cleaned up the code somewhat and added some comments (though some of these are prolix enough to need removing later on). The patch below is intended only to clean up the code without changing its function. Should I commit this patch? If anybody wants to test the change, here is the test file I used to test out forward-paragraph: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; -*-fundamental-*- In this file, hack-variables has set paragraph-start/separate and fill-column. As appropriate, set fill-prefix to "* *" and the left margin to 3 spaces. For the last paragraphs, do M-x longlines-mode. Paragraph following separator line, not matching paragraph-start, no margin. Second line of this paragraph. asdf Paragraph matching paragraph-start, not following separator, no margin. Second line of this paragraph. Next paragraph * *asdf * * * *Paragraph with fill-prefix "* *", following a bare fill-prefix, no margin. * *Second line of this paragraph. * * * *Paragraph with fill-prefix "* *", following a separator line, no margin. * *Second line of this paragraph. * * Paragraph following separator line, not matching paragraph-start, 3-space margin. The preceding and following blank lines are truly blank. Paragraph following separator line, not matching paragraph-start, 3-space margin. The preceding and following "blank" lines are actually 3-space margins. M-{ goes to column zero, however, not to after the margin. This is probably a bug. asdf Paragraph matching paragraph-start, not following separator, 3-space margin. Second line of this paragraph. Next paragraph. * *asdf * * * *Paragraph with fill-prefix "* *", following a bare fill-prefix, 3-space * *margin. * *Paragraph with fill-prefix "* *", following a separator line, 3-space * *margin. This paragraph should be tested with Lonlines Mode enabled. Its second line ;sep; will look like a separator line without actually being one. It also has a third line. This paragraph should be tested with Longlines Mode enable. Its second line ;sep; actually is a separator line. This paragraph should be tested with Longlines Mode enabled. Its second line, * *which follows a soft newline, looks like it begins with a fill-prefix, but doesn't really. ;start; Local Variables: paragraph-start: ";s\\(tart\\|ep\\);\\|[ \t\n\f]" paragraph-separate: ";sep;\\|[ \t\f]*$" fill-column: 78 end: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Here is the patch: 2005-10-19 Alan Mackenzie <acm@muc.de> * paragraphs.el (forward-paragraph): Clean up the code. In particular: Eliminate the unused/frivolous variables found-start, multiple-lines. Reduce the complexity of some nested and/or/progn forms. Comment the code liberally. Index: paragraphs.el =================================================================== RCS file: /cvsroot/emacs/emacs/lisp/textmodes/paragraphs.el,v retrieving revision 1.79 diff -c -r1.79 paragraphs.el *** paragraphs.el 6 Aug 2005 17:41:15 -0000 1.79 --- paragraphs.el 19 Oct 2005 16:43:07 -0000 *************** *** 211,220 **** --- 211,227 ---- ;; starting at the left-margin. This allows paragraph commands to ;; work normally with indented text. ;; This hack will not find problem cases like "whatever\\|^something". + ;; It only has any effect if paragraph-start has been given a value + ;; which violates its specification. (parstart (if (and (not (equal "" paragraph-start)) (equal ?^ (aref paragraph-start 0))) (substring paragraph-start 1) paragraph-start)) + ;; `parsep' (pronounced "par-sepp", not "parse-pee") is a regexp + ;; which matches any separator line (including a bare fill-prefix), + ;; but NOT a paragraph starting line. + ;; Note that the next form has no effect if paragraph-separate has a + ;; valid value. (parsep (if (and (not (equal "" paragraph-separate)) (equal ?^ (aref paragraph-separate 0))) (substring paragraph-separate 1) *************** *** 224,259 **** (concat parsep "\\|" fill-prefix-regexp "[ \t]*$") parsep)) ! ;; This is used for searching. (sp-parstart (concat "^[ \t]*\\(?:" parstart "\\|" parsep "\\)")) ! start found-start) (while (and (< arg 0) (not (bobp))) (if (and (not (looking-at parsep)) (re-search-backward "^\n" (max (1- (point)) (point-min)) t) (looking-at parsep)) (setq arg (1+ arg)) - (setq start (point)) - ;; Move back over paragraph-separating lines. (forward-char -1) (beginning-of-line) (while (and (not (bobp)) (progn (move-to-left-margin) (looking-at parsep))) (forward-line -1)) (if (bobp) nil (setq arg (1+ arg)) ;; Go to end of the previous (non-separating) line. (end-of-line) ! ;; Search back for line that starts or separates paragraphs. (if (if fill-prefix-regexp ! ;; There is a fill prefix; it overrides parstart. ! (let (multiple-lines) (while (and (progn (beginning-of-line) (not (bobp))) (progn (move-to-left-margin) (not (looking-at parsep))) (looking-at fill-prefix-regexp)) - (unless (= (point) start) - (setq multiple-lines t)) (forward-line -1)) (move-to-left-margin) ;; This deleted code caused a long hanging-indent line --- 231,281 ---- (concat parsep "\\|" fill-prefix-regexp "[ \t]*$") parsep)) ! ;; `sp-parstart' matches a line that "looks like" white space ! ;; followed by a paragraph separator or starter. The WS is ! ;; putatively a left margin. Once a line matching this regexp has ! ;; been found, more detailed checking is needed to say whether it ! ;; actually is a separator/starter line. ! ;; Note that if paragraph-start/separate have valid values, `parsep' ! ;; is redundant in the following. (sp-parstart (concat "^[ \t]*\\(?:" parstart "\\|" parsep "\\)")) ! start) (while (and (< arg 0) (not (bobp))) + ;; Crude kludge to do with a blank line being part of the following + ;; paragraph. This is to be removed. (ACM, 2005/10/19) (if (and (not (looking-at parsep)) (re-search-backward "^\n" (max (1- (point)) (point-min)) t) (looking-at parsep)) (setq arg (1+ arg)) (forward-char -1) (beginning-of-line) + + ;; Move back over paragraph-separating lines. (while (and (not (bobp)) (progn (move-to-left-margin) (looking-at parsep))) (forward-line -1)) + ;; We're at the margin of the first non-separator line we've met. (if (bobp) nil (setq arg (1+ arg)) ;; Go to end of the previous (non-separating) line. (end-of-line) ! ! ;; Currently, we're at the EOL on the first non-separator line we've ! ;; encountered. ! ;; ! ;; Search back over the meat of the paragraph for a starter or ! ;; separator line. The condition in the following big `if' form ! ;; evaluates to t if we find such a line, nil if we hit BOB. (if (if fill-prefix-regexp ! ;; If there is a fill prefix, paragraph-start is ignored. ! ;; So is the Longlines mechanism - is this the Right Thing? ! ;; (ACM, 2005/10/19). ! (progn (while (and (progn (beginning-of-line) (not (bobp))) (progn (move-to-left-margin) (not (looking-at parsep))) (looking-at fill-prefix-regexp)) (forward-line -1)) (move-to-left-margin) ;; This deleted code caused a long hanging-indent line *************** *** 265,302 **** ;; multiple-lines ;; (forward-line 1)) (not (bobp))) (while (and (re-search-backward sp-parstart nil 1) ! (setq found-start t) ! ;; Found a candidate, but need to check if it is a ! ;; REAL parstart. ! (progn (setq start (point)) ! (move-to-left-margin) ! (not (looking-at parsep))) ! (not (and (looking-at parstart) ! (or (not use-hard-newlines) ! (bobp) ! (get-text-property ! (1- start) 'hard))))) ! (setq found-start nil) ! (goto-char start)) ! found-start) ! ;; Found one. (progn ;; Move forward over paragraph separators. ;; We know this cannot reach the place we started ;; because we know we moved back over a non-separator. (while (and (not (eobp)) (progn (move-to-left-margin) (looking-at parsep))) (forward-line 1)) ;; If line before paragraph is just margin, back up to there. (end-of-line 0) (if (> (current-column) (current-left-margin)) (forward-char 1) (skip-chars-backward " \t") (if (not (bolp)) (forward-line 1)))) ! ;; No starter or separator line => use buffer beg. (goto-char (point-min)))))) (while (and (> arg 0) (not (eobp))) --- 287,336 ---- ;; multiple-lines ;; (forward-line 1)) (not (bobp))) + + ;; fill-prefix is currently null. (while (and (re-search-backward sp-parstart nil 1) ! ;; point is now at BOL of a candiate start/sep line. ! (setq start (point)) ! (save-excursion ! (move-to-left-margin) ! (and (not (looking-at parsep)) ! (or (not (looking-at-parstart)) ! (and use-hard-newlines ! (not (bobp)) ! (not (get-text-property ! (1- start) 'hard)))))))) ! (or (bobp) (move-to-left-margin)) ! (not (bobp))) ! ! ;; We've found a starter or separator line at or just before the ! ;; start of paragraph we're looking for. The start of paragraph ! ;; can be either a starter line or the line following a ! ;; separator. We're at the line's left margin. (progn ;; Move forward over paragraph separators. ;; We know this cannot reach the place we started ;; because we know we moved back over a non-separator. + ;; + ;; Surely we can move forward over at most one such line? + ;; (ACM, 2005/10/19) (while (and (not (eobp)) (progn (move-to-left-margin) (looking-at parsep))) (forward-line 1)) ;; If line before paragraph is just margin, back up to there. + ;; + ;; This is the frivolous inclusion of a blank line in a + ;; paragraph it precedes. This is to be taken out (ACM, + ;; 2005/10/19). (end-of-line 0) (if (> (current-column) (current-left-margin)) (forward-char 1) (skip-chars-backward " \t") (if (not (bolp)) (forward-line 1)))) ! ! ;; We hit BOB instead of finding a starter or separator line. (goto-char (point-min)))))) (while (and (> arg 0) (not (eobp))) *************** *** 315,332 **** (not (looking-at parsep)) (looking-at fill-prefix-regexp)) (forward-line 1)) (while (and (re-search-forward sp-parstart nil 1) ! (progn (setq start (match-beginning 0)) ! (goto-char start) ! (not (eobp))) ! (progn (move-to-left-margin) ! (not (looking-at parsep))) ! (or (not (looking-at parstart)) ! (and use-hard-newlines ! (not (get-text-property (1- start) 'hard))))) (forward-char 1)) (if (< (point) (point-max)) ! (goto-char start)))) (constrain-to-field nil opoint t) ;; Return the number of steps that could not be done. arg)) --- 349,375 ---- (not (looking-at parsep)) (looking-at fill-prefix-regexp)) (forward-line 1)) + ;; Search forward for the next separator or starter line. Either of + ;; these counts as just after the current paragraph. The left margin + ;; and hard newlines make this searching clumsy. + ;; + ;; What about a non-nil fill-prefix here? Shouldn't that override + ;; paragraph-start here, the way it does for backward movement? (ACM, + ;; 2005/10/19) (while (and (re-search-forward sp-parstart nil 1) ! (goto-char (match-beginning 0)) ! (setq start (point)) ; Remove this later. ! (not (eobp)) ! (save-excursion ! (move-to-left-margin) ! (and (not (looking-at parsep)) ! (or (not (looking-at parstart)) ! (and use-hard-newlines ! (not (get-text-property (1- start) 'hard))))))) (forward-char 1)) (if (< (point) (point-max)) ! (goto-char start)))) ; This is BOL. Why not to the left ! ; margin? (ACM, 2005/10/19) (constrain-to-field nil opoint t) ;; Return the number of steps that could not be done. arg)) -- Alan Mackenzie (Munich, Germany) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Clean-up of forward-paragraph [Re: Beginingless paragraphs: second stab at a patch.] 2005-10-19 16:56 ` Clean-up of forward-paragraph [Re: Beginingless paragraphs: second stab at a patch.] Alan Mackenzie @ 2005-10-20 4:54 ` Richard M. Stallman 2005-10-20 13:53 ` Alan Mackenzie 0 siblings, 1 reply; 20+ messages in thread From: Richard M. Stallman @ 2005-10-20 4:54 UTC (permalink / raw) Cc: emacs-devel (i) When there is a left margin, sometimes forward paragraph moves to column zero, sometimes to after the margin. I think it should always move to after the margin if it can. (It can't when the line doesn't contain enough whitespace to constitute the margin.) I am not sure of that. We have to look at how this affects use of M-h C-w to move a paragraph, followed by doing M-} M-} M-} C-y to yank it back in between two other paragraphs. That is the crucial criterion for what these commands should do. So please try that in all the various cases you can think of, and try to design the behavior of paragraph starts so that it leads to desirable results in that usage. (ii) When `use-hard-newlines' is non-nil (i.e. Longlines Mode is enabled), forward-paragraph can spuriously recognise a line "in the middle of a paragraph" as a separator line when it "looks like" one. (This only shows itself with a non-blank separator line.) Sorry, I do not understand. (iii) When there is a non-nil fill-prefix, f-p uses it in place of paragraph-start when searching backwards, but not when searching forwards. Half of these cases is a bug. I am not sure either one of them is right. What is right is to implement the proper criterion--see the next issue. Incidentally: The Elisp manual is a bit unclear on the page "Paragraphs", where it says: When there is a fill prefix, then paragraphs are delimited by all lines which don't start with the fill prefix. This doesn't make clear (? clear enough) whether this scheme of terminating paragraphs is in addition to or in place of paragraph-s\(tart\|eparate\). I think one of the words "also" and "instead" should be inserted before "delimited". Which one? It should be "in addition". If the fill-prefix is followed by text that would normally separate or start paragraphs, that does separate or start paragraphs. Also, a line that doesn't start with the fill-prefix separates paragraphs. So this is three paragraphs if the fill prefix is "**********". **********Here is one paragraph **********in two lines. ********** **********Here is another paragraph. **********Here is a third paragraph **********in two lines. M-{ and M-} seem to work properly on that text; I see no bug. AAMOI (2): Having told people precisely how to set up paragraph-s\(tart\|eparate\), why does forward-paragraph then go to such lengths to correct incorrect settings: It removes spurious leading "^"s from these variables, and it combine the two into a regexp for searching, even though p-start should match anything that p-separate does. I think that is for compatibility with old code that used these variables when they had a different spec. Please leave this alone. It would make a bigger transient and we don't want that now. As a first step to fixing the bugs in forward-paragraph, I've cleaned up the code somewhat and added some comments (though some of these are prolix enough to need removing later on). The patch below is intended only to clean up the code without changing its function. Should I commit this patch? Not yet. Please consider the issues above, first. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Clean-up of forward-paragraph [Re: Beginingless paragraphs: second stab at a patch.] 2005-10-20 4:54 ` Richard M. Stallman @ 2005-10-20 13:53 ` Alan Mackenzie 2005-10-21 4:50 ` Richard M. Stallman 0 siblings, 1 reply; 20+ messages in thread From: Alan Mackenzie @ 2005-10-20 13:53 UTC (permalink / raw) On Thu, 20 Oct 2005, Richard M. Stallman wrote: [ .... ] > (ii) When `use-hard-newlines' is non-nil (i.e. Longlines Mode is > enabled), forward-paragraph can spuriously recognise a line "in the > middle of a paragraph" as a separator line when it "looks like" one. > (This only shows itself with a non-blank separator line.) >Sorry, I do not understand. With the following text line: First line of paragraph ;sep; second line of par third line , turn on Longlines Mode. The line acquires two "soft" newlines, and now looks like this: First line of paragraph ;sep; second line of par third line. The middle line, beginning with ";sep;", is spuriously recognised as a separator line by paragraph-separate. There is buggy code in forward-paragraph which tries to detect this situation. I will fix it. [ .... ] > (iii) When there is a non-nil fill-prefix, f-p uses it in place of > paragraph-start when searching backwards, but not when searching > forwards. Half of these cases is a bug. >I am not sure either one of them is right. What is right is to >implement the proper criterion--see the next issue. > Incidentally: The Elisp manual is a bit unclear on the page > "Paragraphs", where it says: > When there is a fill prefix, then paragraphs are delimited by all > lines which don't start with the fill prefix. > This doesn't make clear (? clear enough) whether this scheme of > terminating paragraphs is in addition to or in place of > paragraph-s\(tart\|eparate\). I think one of the words "also" and > "instead" should be inserted before "delimited". Which one? >It should be "in addition". OK. >If the fill-prefix is followed by text that would normally separate or start >paragraphs, that does separate or start paragraphs. Not quite: in the current implementation, paragraph-start is ignored when there's a fill-prefix. Non-separator lines without the fill-prefix always start paragraphs. What I've called "divider" lines below, are normal separator lines or lines with the prefix which are otherwise whitespace. The current implementation doesn't test for paragraph-s\(tart\|eparate\) on the same line as the fill-prefix. Should it? I'm trying to think of situations where somebody might want this. Perhaps they'd want an indentation to start a paragraph. Something like this: **********last line of paragraph. ********** First line of new paragraph. I've put together a tentative formulation of paragraph boundaries, (which doesn't as yet deal with this last situation). Please give me some feedback on it. ######################################################################### Definitions: (i) A @dfn{hard BOL} is the beginning of a line following a hard newline. (ii) A @dfn{separator (line)} is a line which paragraph-separate matches after any margin. (iii) A @dfn{starter (line)} is a line which paragraph-start matches after any margin, but which isn't a separator; (iv) A @dfn{divider (line)} is a line which is either a separator or has the fill-prefix (after any left margin) and is otherwise only whitespace. [This definition only applies when the fill-prefix is non-null.] (iv) The beginning and end of (the accessible portion of) the buffer always count as paragraph boundaries. [From this point on, "ALL paragraph boundaries" disregards BOB and EOB.] (v) All other paragraph boundaries are always at BOL, even when there is a left margin. (This is so that M-h C-w will always grab complete paragraphs.) (vi) When `use-hard-newlines' is non-nil, all paragraph boundaries are at hard BOLs. A paragraph starts at a non-separator line, and ends at the next hard BOL. Here, fill-prefix and paragraph-start are ignored. (vii) Otherwise, if fill-prefix is non-null: A paragraph starts at any non-divider line which either lacks the fill-prefix or follows a divider line. A paragraph ends at the start of the next paragraph or a divider line. Here, paragraph-start is ignored. (viii) Otherwise (fill-prefix is null), a paragraph starts at a non-separator line which is either a starter line or follows a separator line. It ends just before the next separator or starter line. (ix) If there happens to be a blank line before a paragraph start, this line is NOT regarded as being part of the paragraph. [This is the problem which was at the heart of this thread.] ######################################################################### -- Alan Mackenzie (Munich, Germany) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Clean-up of forward-paragraph [Re: Beginingless paragraphs: second stab at a patch.] 2005-10-20 13:53 ` Alan Mackenzie @ 2005-10-21 4:50 ` Richard M. Stallman 2005-10-21 20:09 ` Alan Mackenzie 0 siblings, 1 reply; 20+ messages in thread From: Richard M. Stallman @ 2005-10-21 4:50 UTC (permalink / raw) Cc: emacs-devel The current implementation doesn't test for paragraph-s\(tart\|eparate\) on the same line as the fill-prefix. Should it? I think it is important to see what past versions of Emacs did--for instance, Emacs 19, before the support for a left margin was added. If it was always done this way, then I think we should document it clearly and not change it. There are no users asking for changes in this, and changing it would be risky. However, if the past behavior was confused or conflicting, we need to figure out which past behavior to be compatible with. Regarding your proposed definition of paragraphs, I am concerned about possible incompatibilities. In the "new" cases, those of use-hard-newlines and nonempty left margin, we are not particularly bound by compatibility. However, in the other cases we are. (iv) A @dfn{divider (line)} is a line which is either a separator or has the fill-prefix (after any left margin) and is otherwise only whitespace. [This definition only applies when the fill-prefix is non-null.] I think that together with (vii) are very hard to understand. (vi) When `use-hard-newlines' is non-nil, all paragraph boundaries are at hard BOLs. A paragraph starts at a non-separator line, and ends at the next hard BOL. Here, fill-prefix and paragraph-start are ignored. Does this make some unstated assumption about how separator lines and hard newlines relate to each other? Perhaps it is just that the text is confusing. (ix) If there happens to be a blank line before a paragraph start, this line is NOT regarded as being part of the paragraph. [This is the problem which was at the heart of this thread.] I am not quite sure what that means in concrete terms. As I said before, that blank line MUST be part of the following paragraph. That is essential for compatibility. I think that at present we should probably stick to fixing anything which is most obviously a bug. For instance, all paragraph beginnings and ends should be at BOL; when it fails to do that, that is worth fixing. Bigger changes should wait for after the release. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Clean-up of forward-paragraph [Re: Beginingless paragraphs: second stab at a patch.] 2005-10-21 4:50 ` Richard M. Stallman @ 2005-10-21 20:09 ` Alan Mackenzie 2005-10-22 15:51 ` Richard M. Stallman 0 siblings, 1 reply; 20+ messages in thread From: Alan Mackenzie @ 2005-10-21 20:09 UTC (permalink / raw) On Fri, 21 Oct 2005, Richard M. Stallman wrote: > The current implementation doesn't test for paragraph-s\(tart\|eparate\) > on the same line as the fill-prefix. Should it? >I think it is important to see what past versions of Emacs did--for >instance, Emacs 19, before the support for a left margin was added. >If it was always done this way, then I think we should document it >clearly and not change it. There are no users asking for changes in >this, and changing it would be risky. However, if the past behavior >was confused or conflicting, we need to figure out which past behavior >to be compatible with. I think we are agreed, here: (i) The current implementation of forward-paragraph doesn't test for p-start/separate on the same lines as fill-prefixes; (ii) No users are clamouring for this facility; (iii) Any major modes which need this sort of thing can do so be setting p-start/separate appropriately (as CC Mode does). Let's leave it the way it already is! ;-) >Regarding your proposed definition of paragraphs, I am concerned about >possible incompatibilities. I'd like to stress I'm NOT trying to change the definition of paragraphs, merely to formulate the existing definition, which is to some extent embodied in forward-paragraph rather than being totally explicit. >In the "new" cases, those of use-hard-newlines and nonempty left margin, >we are not particularly bound by compatibility. However, in the other >cases we are. > (iv) A @dfn{divider (line)} is a line which is either a separator or > has the fill-prefix (after any left margin) and is otherwise only > whitespace. [This definition only applies when the fill-prefix is > non-null.] >I think that together with (vii) are very hard to understand. By "divider line" I was trying to say "a separator line when there's a fill-prefix". I didn't make a good job of it. Sorry. I've revised this formulation extensively, removing this confusing term. See below. > (vi) When `use-hard-newlines' is non-nil, all paragraph boundaries > are at hard BOLs. A paragraph starts at a non-separator line, and > ends at the next hard BOL. Here, fill-prefix and paragraph-start > are ignored. >Does this make some unstated assumption about how separator lines and >hard newlines relate to each other? Perhaps it is just that the text >is confusing. The existing code tests `use-hard-newlines', which it considers equivalent to Longlines Mode being enabled. I don't think there're any hidden assumptions there. Merely that, conceptually, only hard newlines are "real" newlines, since soft newlines are as fickle as SCO lawyers, shifting around hither and thither as the text changes. Thus, the only meaningful place to look for a separator is just after a hard newline. Is there any meaning for `use-hard-newlines' other than "Longlines Mode is enabled"? > (ix) If there happens to be a blank line before a paragraph start, > this line is NOT regarded as being part of the paragraph. [This is > the problem which was at the heart of this thread.] >I am not quite sure what that means in concrete terms. As I said >before, that blank line MUST be part of the following paragraph. >That is essential for compatibility. The problem which started me off on all this was that of a blank line belonging to two paragraphs, as in the following file: ------------------------------------------------------ 1st Line [starter] asdf 1st Line [starter] asdf - Local Variables: paragraph-separate: "-" paragraph-start: "1st Line\\|-" End: ----------------------------------------------------- I think I understand now what's going on. In the Emacs manual (page "Paragraphs") is: When you wish to operate on a paragraph, you can use the command `M-h' (`mark-paragraph') to set the region around it. .... If there are blank lines preceding the first line of the paragraph, one of these blank lines is included in the region. The idea here is that you can do M-h C-w to kill a paragraph, move somewhere else with M-{ and M-}, then insert it again with C-y. All this without having the hassle of manually deleting/inserting blank lines. This has been implemented as (forward-paragraph -1) moving to the blank line. This is a misfeature, IMAO, no matter how long it may have been so. Surely `mark-paragraph' should be doing the job of including this blank line, not forward-paragraph. This blank line is NOT itself part of the paragraph. [Suggestions for Emacs 23: "blank line" in the above should be generalised here to "separator line". We should make the definition of paragraph-separate explicitly state that it matches AT MOST a single line, so that separators can be found reliably whilst searching backwards. forward/backward-paragraph should be supplemented by (or even superseded by) beginning/end-of-paragraph, which would work like b/e-of-defun. The "blank line preceding the paragraph" should be moved into `mark-paragraph'.] I discovered this whilst writing @dfn{Paragraphs} in Elisp's searching.texi. I wanted to write "Paragraphs don't overlap.", and felt constrained to qualify it with "@footnote{In certain obscure circumstances it is possible for a blank line to be both the last line of one paragraph and the first line of the next.}". I now think I should just ignore this obscure case in the documentation, and fix the code somehow and sometime for Emacs 23. >I think that at present we should probably stick to fixing anything >which is most obviously a bug. For instance, all paragraph beginnings >and ends should be at BOL; when it fails to do that, that is worth >fixing. Bigger changes should wait for after the release. OK. There are several bugs in forward-paragraph. I will fix them. The easiest way to fix them is with a thoroughgoing refactoring of the function (which I have already done). I suspect, though, you will prefer the basic structure of the code to be left unchanged. Please confirm or deny this! Here is my formulation of paragraph boundaries, thorougly revised and incorporating the comments you've made: Note: Items enclosed in braces are purely for clarification. ######################################################################### DEFINITIONS: (i) A @dfn{hard BOL} is the beginning of a line following a hard newline. (ii) A @dfn{separator (line)} is a line which separates paragraphs without being part of a one. (iii) A @dfn{starter (line)} is a line which, when present, always begins a paragraph. {Note that not every paragraph need begin with a starter.} (iv) {A line can not be both a starter and a separator.} SPECIAL HANDLING OF A PRECEDING BLANK LINE: (v) In all of the following, if there should happen to be a blank line immediately preceding the beginning of a paragraph, this beginning will be modified to include the blank line. A "blank line" here is one which contains only whitespace, and no more than a left margin's worth of it. SPECIAL STUFF ABOUT BOB/EOB: (vi) The beginning and end of (the accessible portion of) the buffer always count as paragraph boundaries. [From this point on, "ALL paragraph boundaries" disregards BOB and EOB.] (vii) All other paragraph boundaries are at BOL, even when there is a left margin. {This is so that M-h C-w will always grab complete paragraphs.} CORE OF PARAGRAPH DEFINITION: (viii) A paragraph starts either at a starter, or at a line which isn't a separator, yet follows one. (ix) A paragraph ends at a starter or a separator line. WHEN use-hard-newlines IS NON-NIL {"Longlines Mode"}: (x) {All paragraph boundaries are at hard BOLs. A single "long line" is regarded as a paragraph.} (xi) A separator is a line at a hard BOL, and which matches paragraph-separate at its left-margin. (xii) A starter is any line at a hard BOL which isn't a separator. (xiii) {fill-prefix and paragraph-start play no role here.} OTHERWISE, WHEN fill-prefix IS NON-NULL: (xiv) A separator is a line which either: (a) matches paragraph-separate at its left margin; or (b) contains a valid fill-prefix and is otherwise blank (WS is allowed). (xv) A starter is a line which isn't a separator and lacks a valid fill-prefix. (xvi) {paragraph-start plays no role, here.} OTHERWISE, (use-hard-newlines AND fill-prefix ARE BOTH NULL): (xvii) A separator is a line which matches paragraph-separate at its left margin. (xviii) A starter is a line which isn't a separator and matches paragraph-start at its left margin. ######################################################################### -- Alan Mackenzie (Munich, Germany) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Clean-up of forward-paragraph [Re: Beginingless paragraphs: second stab at a patch.] 2005-10-21 20:09 ` Alan Mackenzie @ 2005-10-22 15:51 ` Richard M. Stallman 0 siblings, 0 replies; 20+ messages in thread From: Richard M. Stallman @ 2005-10-22 15:51 UTC (permalink / raw) Cc: emacs-devel The problem which started me off on all this was that of a blank line belonging to two paragraphs, as in the following file: The bug there is that the blank line is treated as part of the _preceding_ paragraph. It should be part of the following paragraph, and _not_ part of the preceding paragraph. Can you fix that bug in forward-paragraph? This has been implemented as (forward-paragraph -1) moving to the blank line. This is a misfeature, IMAO, no matter how long it may have been so. I don't think so. I normally want M-{ and M-} to stop in the blank line between paragraphs. Normally that blank line is a separator, so it will happen anyway. But as long as that special feature for blank lines remains, it should apply to M-} and M-{, just as to M-h. [Suggestions for Emacs 23: "blank line" in the above should be generalised here to "separator line". I don't understand what that would mean. We should make the definition of paragraph-separate explicitly state that it matches AT MOST a single line, so that separators can be found reliably whilst searching backwards. It can't hurt to add this to the documentation, since that assumption is already made. We could do it now. forward/backward-paragraph should be supplemented by (or even superseded by) beginning/end-of-paragraph, which would work like b/e-of-defun. I have nothing against it, but I don't see much use for it, and there are no keys to put them on. OK. There are several bugs in forward-paragraph. I will fix them. The easiest way to fix them is with a thoroughgoing refactoring of the function (which I have already done). I suspect, though, you will prefer the basic structure of the code to be left unchanged. Please confirm or deny this! For now, I'd rather stick with the basic structure of the code. After the release, we could refactor it. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2005-10-22 15:51 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-08-30 10:50 Beginingless paragraphs Alan Mackenzie 2005-08-30 11:48 ` Benjamin Riefenstahl 2005-08-31 14:36 ` Richard M. Stallman 2005-08-31 17:00 ` Eli Zaretskii 2005-08-31 18:11 ` Alan Mackenzie 2005-09-01 15:53 ` Richard M. Stallman 2005-09-01 17:56 ` Alan Mackenzie 2005-09-01 23:17 ` Thien-Thi Nguyen 2005-09-03 1:42 ` Richard M. Stallman 2005-09-03 1:41 ` Richard M. Stallman 2005-09-03 12:26 ` Alan Mackenzie 2005-09-04 16:49 ` Richard M. Stallman 2005-09-07 19:17 ` Beginingless paragraphs: second stab at a patch Alan Mackenzie 2005-09-08 9:04 ` Richard M. Stallman 2005-10-19 16:56 ` Clean-up of forward-paragraph [Re: Beginingless paragraphs: second stab at a patch.] Alan Mackenzie 2005-10-20 4:54 ` Richard M. Stallman 2005-10-20 13:53 ` Alan Mackenzie 2005-10-21 4:50 ` Richard M. Stallman 2005-10-21 20:09 ` Alan Mackenzie 2005-10-22 15:51 ` Richard M. Stallman
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.