unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* paragraphs.el: do forward-sentence and friends not work?
@ 2008-02-13 17:08 David Reitter
  2008-02-13 17:27 ` Andreas Schwab
  0 siblings, 1 reply; 24+ messages in thread
From: David Reitter @ 2008-02-13 17:08 UTC (permalink / raw)
  To: emacs-pretest-bug

Maybe I misunderstand the purpose of this function, but `forward- 
sentence' doesn't work for me. It always jumps to the end of the  
paragraph rather than to the end of the sentence.

This is in recent GNU Emacs 22 CVS builds.

It appears that there may be one or more bugs in the definition of  
`sentence-end' and the default of sentence-end-base.

(sentence-end) returns:

"\\([.?!][]\"'””)}]*\\($\\| $\\|	\\|  \\)\\| 
[。.?!。.?!。.?!����]+\\)[ 	
]*"

Aren't . and ? supposed to be escaped?

Even when escaped, I couldn't make it work. However, if I set sentence- 
end:

(setq sentence-end "\\([\\.\\?!][ \\\\]\\)\\|\n")

things work alright (for basic sentences).

Maybe I am an Emacs-specific meaning associated with "sentence" has  
eluded me so far. Or, is there something broken in M-e and M-a? 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-13 17:08 paragraphs.el: do forward-sentence and friends not work? David Reitter
@ 2008-02-13 17:27 ` Andreas Schwab
  2008-02-13 17:32   ` David Reitter
  0 siblings, 1 reply; 24+ messages in thread
From: Andreas Schwab @ 2008-02-13 17:27 UTC (permalink / raw)
  To: David Reitter; +Cc: emacs-pretest-bug

David Reitter <david.reitter@gmail.com> writes:

> Maybe I misunderstand the purpose of this function, but `forward-
> sentence' doesn't work for me. It always jumps to the end of the paragraph
> rather than to the end of the sentence.

You need to set sentence-end-double-space to nil.

> "\\([.?!][]\"'””)}]*\\($\\| $\\|	\\|  \\)\\|
> [。.?!。.?!。.?!����]+\\)[ 	
> ]*"
>
> Aren't . and ? supposed to be escaped?

No, they are not special inside bracket expressions.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-13 17:27 ` Andreas Schwab
@ 2008-02-13 17:32   ` David Reitter
  2008-02-13 20:00     ` Stephen J. Turnbull
                       ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: David Reitter @ 2008-02-13 17:32 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: emacs-pretest-bug

On 13 Feb 2008, at 17:27, Andreas Schwab wrote:
>>
>> Maybe I misunderstand the purpose of this function, but `forward-
>> sentence' doesn't work for me. It always jumps to the end of the  
>> paragraph
>> rather than to the end of the sentence.
>
> You need to set sentence-end-double-space to nil.

That works. Why is this variable not nil by default?





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-13 17:32   ` David Reitter
@ 2008-02-13 20:00     ` Stephen J. Turnbull
  2008-02-14  4:42       ` Richard Stallman
  2008-02-14  9:10       ` David Reitter
  2008-02-13 20:36     ` Stefan Monnier
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 24+ messages in thread
From: Stephen J. Turnbull @ 2008-02-13 20:00 UTC (permalink / raw)
  To: David Reitter; +Cc: Andreas Schwab, emacs-pretest-bug

David Reitter writes:

 > On 13 Feb 2008, at 17:27, Andreas Schwab wrote:
 > >>
 > >> Maybe I misunderstand the purpose of this function, but `forward-
 > >> sentence' doesn't work for me. It always jumps to the end of the  
 > >> paragraph
 > >> rather than to the end of the sentence.
 > >
 > > You need to set sentence-end-double-space to nil.
 > 
 > That works. Why is this variable not nil by default?

Because AFAIK it's still Chicago Manual of Style-standard to follow a
sentence-ending period with two spaces in typed documents.  (Even with
a proportional font, Emacs will not insert extra visual space to
delimit a sentence unless it's present in the document.)

Many people find this more readable; the only people I've ever seen
say it's less readable undercut the argument by simultaneously
complaining about the inefficiency of typing multiple spaces.  So I
think the default should be sentence-end-double-space is t to
encourage readability.





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-13 17:32   ` David Reitter
  2008-02-13 20:00     ` Stephen J. Turnbull
@ 2008-02-13 20:36     ` Stefan Monnier
  2008-02-13 20:52       ` Thorsten Bonow
  2008-02-13 23:06     ` Miles Bader
  2008-02-14  2:18     ` Robert J. Chassell
  3 siblings, 1 reply; 24+ messages in thread
From: Stefan Monnier @ 2008-02-13 20:36 UTC (permalink / raw)
  To: David Reitter; +Cc: Andreas Schwab, emacs-pretest-bug

>>> Maybe I misunderstand the purpose of this function, but `forward-
>>> sentence' doesn't work for me. It always jumps to the end of the
>>> paragraph
>>> rather than to the end of the sentence.
>> 
>> You need to set sentence-end-double-space to nil.

> That works. Why is this variable not nil by default?

The problem is that "half the world" doesn't know that "the other half"
uses 2-spaces after a ".".  So the docstring of
sentence-end-double-space should probably explain that this is
a convention used throughout North America (AFAICT).


        Stefan





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-13 20:36     ` Stefan Monnier
@ 2008-02-13 20:52       ` Thorsten Bonow
  0 siblings, 0 replies; 24+ messages in thread
From: Thorsten Bonow @ 2008-02-13 20:52 UTC (permalink / raw)
  To: monnier; +Cc: david.reitter, emacs-pretest-bug, schwab

>>>>> "Stefan" == Stefan Monnier <monnier@iro.umontreal.ca> writes:

    [...]

    Stefan> The problem is that "half the world" doesn't know that "the other
    Stefan> half" uses 2-spaces after a ".".  So the docstring of
    Stefan> sentence-end-double-space should probably explain that this is a
    Stefan> convention used throughout North America (AFAICT).

Funnily enough, this convention used in the English speaking world is called
"french spacing" or---not joking---non french spacing. The meaning became
reversed over time (in some countries). Wikipedia tries to enlighten fearless
readers :-)

http://en.wikipedia.org/wiki/French_spacing




-- 
Contact information and PGP key at
http://www-users.rwth-aachen.de/thorsten.bonow

When your world is full of strange arrangements
And gravity wont pull you through
You know youre missing out on something
Well that something depends on you
			ABC -- The Look of Love




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-13 17:32   ` David Reitter
  2008-02-13 20:00     ` Stephen J. Turnbull
  2008-02-13 20:36     ` Stefan Monnier
@ 2008-02-13 23:06     ` Miles Bader
  2008-02-14  2:18     ` Robert J. Chassell
  3 siblings, 0 replies; 24+ messages in thread
From: Miles Bader @ 2008-02-13 23:06 UTC (permalink / raw)
  To: David Reitter; +Cc: Andreas Schwab, emacs-pretest-bug

David Reitter <david.reitter@gmail.com> writes:
>> You need to set sentence-end-double-space to nil.
>
> That works. Why is this variable not nil by default?

Because we like it to be t.

-Miles

-- 
Genealogy, n. An account of one's descent from an ancestor who did not
particularly care to trace his own.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-13 17:32   ` David Reitter
                       ` (2 preceding siblings ...)
  2008-02-13 23:06     ` Miles Bader
@ 2008-02-14  2:18     ` Robert J. Chassell
  3 siblings, 0 replies; 24+ messages in thread
From: Robert J. Chassell @ 2008-02-14  2:18 UTC (permalink / raw)
  To: emacs-devel

    > You need to set sentence-end-double-space to nil.

    That works. Why is this variable not nil by default?

Because in North American English, the custom developed to end
sentences with double spaces.  With single spaces between sentences, I
find text less readable.  (I don't know the custom in UK English or in
other languages.)  In many ways, the custom is similar to that for
punctuation:  a semi-colon indicates a longer pause than a comma and a
period indicates a longer pause than a semi-colon.  (A `period' in
American English is a `fullstop' in UK English.  As far as I can see,
the British word makes more sense than the American.)

-- 
    Robert J. Chassell                          GnuPG Key ID: 004B4AC8
    bob@rattlesnake.com                         bob@gnu.org
    http://www.rattlesnake.com                  http://www.teak.cc




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-13 20:00     ` Stephen J. Turnbull
@ 2008-02-14  4:42       ` Richard Stallman
  2008-02-14  9:45         ` David Reitter
  2008-02-14  9:10       ` David Reitter
  1 sibling, 1 reply; 24+ messages in thread
From: Richard Stallman @ 2008-02-14  4:42 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: david.reitter, emacs-pretest-bug, schwab

Using two spaces after end of sentence enables Emacs to distinguish
between periods that end sentences and periods for abbreviations.
That is why it should be the default.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-13 20:00     ` Stephen J. Turnbull
  2008-02-14  4:42       ` Richard Stallman
@ 2008-02-14  9:10       ` David Reitter
  2008-02-14  9:22         ` Miles Bader
  2008-02-14 10:44         ` Stephen J. Turnbull
  1 sibling, 2 replies; 24+ messages in thread
From: David Reitter @ 2008-02-14  9:10 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: Andreas Schwab, emacs-pretest-bug

On 13 Feb 2008, at 20:00, Stephen J. Turnbull wrote:
> Because AFAIK it's still Chicago Manual of Style-standard to follow a
> sentence-ending period with two spaces in typed documents.  (Even with
> a proportional font, Emacs will not insert extra visual space to
> delimit a sentence unless it's present in the document.)

I don't have the Chicago Manual of Style here, but may I quote from  
Wikipedia (from the French_spacing article):

===
Recently some widely-used American style guides, notably the Chicago  
Manual of Style, call for a single space after full stops and colons. 
[86][87] In chapter 6 Punctuation
section 3 Typographic and Aesthetic Considerations, for example, the  
Chicago Manual of Style states:

     6.11 Space between sentences

     In typeset matter, one space, not two (in other words, a regular  
word space), follows any mark of punctuation [sic] that ends a  
sentence, whether a period, a colon [sic], a question mark, an  
exclamation point, or closing quotation marks.
The FAQ to the Chicago Manual of Style explicitly states that the  
"traditional American practice" is to double-space after colons and  
periods (without mentioning semi-colons) but then states that "This  
practice is discouraged by the University of Chicago Press".[88]
===

But whatever the manual says, why impose on the user?


On 13 Feb 2008, at 20:36, Stefan Monnier wrote:

> The problem is that "half the world" doesn't know that "the other  
> half"
> uses 2-spaces after a ".".  So the docstring of
> sentence-end-double-space should probably explain that this is
> a convention used throughout North America (AFAICT).

I wonder whether this is either something that should be part of MULE.
Or, won't setting `sentence-end-double-space' to nil make it work in  
most other cases?
The regular expression could be improved to recognize abbreviations  
correctly.







^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-14  9:10       ` David Reitter
@ 2008-02-14  9:22         ` Miles Bader
  2008-02-14  9:46           ` David Reitter
  2008-02-14 10:44         ` Stephen J. Turnbull
  1 sibling, 1 reply; 24+ messages in thread
From: Miles Bader @ 2008-02-14  9:22 UTC (permalink / raw)
  To: David Reitter; +Cc: Andreas Schwab, Stephen J. Turnbull, emacs-pretest-bug

David Reitter <david.reitter@gmail.com> writes:
> I wonder whether. this is either something that should be part of MULE.

What does it have to do with MULE?

-Miles

-- 
Arrest, v. Formally to detain one accused of unusualness.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-14  4:42       ` Richard Stallman
@ 2008-02-14  9:45         ` David Reitter
  2008-02-14 14:22           ` Robert J. Chassell
                             ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: David Reitter @ 2008-02-14  9:45 UTC (permalink / raw)
  To: rms; +Cc: schwab, Stephen J. Turnbull, emacs-pretest-bug

On 14 Feb 2008, at 04:42, Richard Stallman wrote:

> Using two spaces after end of sentence enables Emacs to distinguish
> between periods that end sentences and periods for abbreviations.
> That is why it should be the default.


We can improve this to make it work without depending on the double- 
space.

Sentence tokenization is a known problem. You can throw machine  
learning algorithms at it, but that's not a viable option in our case.  
However, Grefenstette&Tapanainen (1994) examined this in detail for  
English, using the Brown corpus. They basically say that using a small  
lexicon of common abbreviations, they can classify 99.1% of all  
periods correctly. Even without the lexicon, you can achieve 97.7%  
accuracy (on English) using the right regular expressions, and I think  
this will be similar for other languages as well. I think that's good  
enough for M-e and M-a.

http://citeseer.ist.psu.edu/grefenstette94what.html



  




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-14  9:22         ` Miles Bader
@ 2008-02-14  9:46           ` David Reitter
  2008-02-14 10:07             ` Miles Bader
  0 siblings, 1 reply; 24+ messages in thread
From: David Reitter @ 2008-02-14  9:46 UTC (permalink / raw)
  To: Miles Bader; +Cc: Andreas Schwab, Stephen J. Turnbull, emacs-pretest-bug

On 14 Feb 2008, at 09:22, Miles Bader wrote:

> David Reitter <david.reitter@gmail.com> writes:
>> I wonder whether. this is either something that should be part of  
>> MULE.
>
> What does it have to do with MULE?

People here have argued that French spacing is a language-specific  
matter. (Of course it's not a question of encodings.)




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-14  9:46           ` David Reitter
@ 2008-02-14 10:07             ` Miles Bader
  0 siblings, 0 replies; 24+ messages in thread
From: Miles Bader @ 2008-02-14 10:07 UTC (permalink / raw)
  To: David Reitter; +Cc: Andreas Schwab, Stephen J. Turnbull, emacs-pretest-bug

David Reitter <david.reitter@gmail.com> writes:
>>> I wonder whether. this is either something that should be part of
>>> MULE.
>>
>> What does it have to do with MULE?
>
> People here have argued that French spacing is a language-specific
> matter. (Of course it's not a question of encodings.)

Actually, what I wish is that I could separate the "recognition"
functionality (M-a, M-e, fill, etc) of sentence-end-double-space from
the "canonicalize" functionality (in fill).

This is because I often edit text written by other people.  I'd like M-a
and M-e to continue working on their text (even though personally prefer
to use two spaces), and indeed, I'd like filling to recognize those
single-space ends of sentences and replace them with two spaces (it
currently can only do the reverse).

Of course this would work better with a more intelligent recognition of
single-space sentence ends, as you mention.

-Miles

-- 
Once, adj. Enough.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-14  9:10       ` David Reitter
  2008-02-14  9:22         ` Miles Bader
@ 2008-02-14 10:44         ` Stephen J. Turnbull
  2008-02-14 12:27           ` David Reitter
  1 sibling, 1 reply; 24+ messages in thread
From: Stephen J. Turnbull @ 2008-02-14 10:44 UTC (permalink / raw)
  To: David Reitter; +Cc: Andreas Schwab, emacs-pretest-bug

David Reitter writes:

 >      In typeset matter,

Emacs is not a typesetter.

 > The regular expression could be improved to recognize abbreviations  
 > correctly.

I could care less about what regular expressions can recognize.  My
eyes are not regular-expression matchers; the amount of whitespace
matters greatly to readability.

Do you wish to maintain the opposite, that the extra whitespace makes
frenchspaced text less readable?




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-14 10:44         ` Stephen J. Turnbull
@ 2008-02-14 12:27           ` David Reitter
  2008-02-14 22:25             ` Stephen J. Turnbull
  0 siblings, 1 reply; 24+ messages in thread
From: David Reitter @ 2008-02-14 12:27 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: Andreas Schwab, emacs-pretest-bug

On 14 Feb 2008, at 10:44, Stephen J. Turnbull wrote:
>
> Do you wish to maintain the opposite, that the extra whitespace makes
> frenchspaced text less readable?

Oh no, and that wasn't my point.

You said that the Emacs default is to require users to use double  
spacing "because AFAIK it's still Chicago Manual of Style-standard to  
follow a
sentence-ending period with two spaces".

Now I've quoted the Chicago manual of style from the Wikipedia entry,  
saying the exact opposite.

By your logic, Emacs would have to adopt a different default then.

That said, while I do believe that the double-spacing is an  
improvement to readability, I want to be able to edit other people's  
texts. (It is an useless attitude to impose on users a writing style  
uncommon outside the U.S.--crippling the M-e / M-a features does not  
help anyone.)







^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-14  9:45         ` David Reitter
@ 2008-02-14 14:22           ` Robert J. Chassell
  2008-02-14 14:43           ` Stefan Monnier
  2008-02-15  0:02           ` Richard Stallman
  2 siblings, 0 replies; 24+ messages in thread
From: Robert J. Chassell @ 2008-02-14 14:22 UTC (permalink / raw)
  To: emacs-devel

Note that a 13th edition of the Chicago Manual of Style, copyright
1982, does not speak of end of line spaces for regular text, but its
physical embodiment puts more space after a sentence than between
words within a sentence.  (I have not seen a more recent Chicago
Manual of Style, physical or otherwise.)

    But whatever the manual says, why impose on the user?

As of this morning, 2008 Feb 14, we see in

    http://en.wikipedia.org/wiki/French_spacing

this:

        * "rivers" of whitespace do not distract readers

        * widened spaces between sentences improve reader
          comprehension and reader comfort

    Unusually for sociological research -- extraordinarily so -- no
    valid or even scientific studies have materially contradicted
    these findings.

-- 
    Robert J. Chassell                          GnuPG Key ID: 004B4AC8
    bob@rattlesnake.com                         bob@gnu.org
    http://www.rattlesnake.com                  http://www.teak.cc




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-14  9:45         ` David Reitter
  2008-02-14 14:22           ` Robert J. Chassell
@ 2008-02-14 14:43           ` Stefan Monnier
  2008-02-14 15:52             ` David Reitter
  2008-06-13 14:14             ` David Reitter
  2008-02-15  0:02           ` Richard Stallman
  2 siblings, 2 replies; 24+ messages in thread
From: Stefan Monnier @ 2008-02-14 14:43 UTC (permalink / raw)
  To: David Reitter; +Cc: schwab, Stephen J. Turnbull, rms, emacs-pretest-bug

>> Using two spaces after end of sentence enables Emacs to distinguish
>> between periods that end sentences and periods for abbreviations.
>> That is why it should be the default.

> We can improve this to make it work without depending on the double-
> space.

> Sentence tokenization is a known problem. You can throw machine learning
> algorithms at it, but that's not a viable option in our case.  However,
> Grefenstette&Tapanainen (1994) examined this in detail for  English, using
> the Brown corpus. They basically say that using a small  lexicon of common
> abbreviations, they can classify 99.1% of all  periods correctly. Even
> without the lexicon, you can achieve 97.7%  accuracy (on English) using the
> right regular expressions, and I think  this will be similar for other
> languages as well. I think that's good  enough for M-e and M-a.

But the period-single-space vs period-double-space distinction allows us
to get it right 100% in many more languages than just English.


        Stefan "Who switched to non-French spacing even when writing French"




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-14 14:43           ` Stefan Monnier
@ 2008-02-14 15:52             ` David Reitter
  2008-02-14 16:04               ` Miles Bader
  2008-06-13 14:14             ` David Reitter
  1 sibling, 1 reply; 24+ messages in thread
From: David Reitter @ 2008-02-14 15:52 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: schwab, Stephen J. Turnbull, rms, emacs-pretest-bug

[-- Attachment #1: Type: text/plain, Size: 1623 bytes --]

On 14 Feb 2008, at 14:43, Stefan Monnier wrote:
>>
>> without the lexicon, you can achieve 97.7%  accuracy (on English)  
>> using the
>> right regular expressions, and I think  this will be similar for  
>> other
>> languages as well. I think that's good  enough for M-e and M-a.
>
> But the period-single-space vs period-double-space distinction  
> allows us
> to get it right 100% in many more languages than just English.


Do people write like this in other languages?
 From our discussion here today I take it that they don't.  Thus, your  
accuracy may be more like 20% in other languages or even parts of the  
world, assuming that one if five either adopt American conventions or  
find the customization variable.  This will probably work or it won't,  
depending on the particular user.

If my Aquamacs statistics (as attached) are a representative sample,  
about half the Emacs users are located outside the US.  (And I'm  
pretty sure that there is a very strong Japanese population that skews  
this even further).  How many of these commonly use double-spacing?

Do you want users to adapt to the software, or do you want the  
software to provide what is needed to deal with a user's needs?

Consider that variable width fonts are more common now.  I do most  
things using variable-width fonts, including LaTeX editing.  Maybe  
that is why I never bothered with double spacing, even though I quite  
like it in principle.

I'll shut up with this.  Still, should people decide they want a patch  
using the Grefenstette et al. method (or even something more modern),  
I'd be happy to work that out.





[-- Attachment #2: countries.pdf --]
[-- Type: application/pdf, Size: 11804 bytes --]

[-- Attachment #3: Type: text/plain, Size: 1 bytes --]



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-14 15:52             ` David Reitter
@ 2008-02-14 16:04               ` Miles Bader
  2008-02-15  5:48                 ` Jonathan Rockway
  0 siblings, 1 reply; 24+ messages in thread
From: Miles Bader @ 2008-02-14 16:04 UTC (permalink / raw)
  To: David Reitter
  Cc: schwab, Stephen J. Turnbull, Stefan Monnier, emacs-pretest-bug,
	rms

David Reitter <david.reitter@gmail.com> writes:
> If my Aquamacs statistics (as attached) are a representative sample,
> about half the Emacs users are located outside the US.  (And I'm  pretty
> sure that there is a very strong Japanese population that skews  this
> even further).  How many of these commonly use double-spacing?

Japanese doesn't use spacing at all, really...

-Miles

-- 
Generous, adj. Originally this word meant noble by birth and was rightly
applied to a great multitude of persons. It now means noble by nature and is
taking a bit of a rest.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-14 12:27           ` David Reitter
@ 2008-02-14 22:25             ` Stephen J. Turnbull
  0 siblings, 0 replies; 24+ messages in thread
From: Stephen J. Turnbull @ 2008-02-14 22:25 UTC (permalink / raw)
  To: David Reitter; +Cc: emacs-pretest-bug

David Reitter writes:

 > Now I've quoted the Chicago manual of style from the Wikipedia entry,  
 > saying the exact opposite.

You quoted a section that referred to typeset material, which is
irrelevant.  Emacs is equivalent to a typewriter; if you want (badly)
typeset text, use Word or OOo.  Quote me an equivalent section from
the CMOS on manuscripts (I don't have my CMOS here, but I'm pretty
sure that it recommends frenchspacing for typescript), and then I'll
take notice.  Note that proportionally-spaced fonts != typesetting.

 > It is an useless attitude to impose on users a writing style
 > uncommon outside the U.S.--crippling the M-e / M-a features does
 > not help anyone.

Sentence movement is neither imposed nor crippled; you just have to
toggle it to deal with a custom intended for the convenience of
writers rather than readers.

And no, it's not useless to encourage useful standards.  I think that
this default is a good thing, since readers don't always have much
input into style and get the most godawful crap jammed into their
inboxes.  As with litter, every little bit helps.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-14  9:45         ` David Reitter
  2008-02-14 14:22           ` Robert J. Chassell
  2008-02-14 14:43           ` Stefan Monnier
@ 2008-02-15  0:02           ` Richard Stallman
  2 siblings, 0 replies; 24+ messages in thread
From: Richard Stallman @ 2008-02-15  0:02 UTC (permalink / raw)
  To: David Reitter; +Cc: schwab, stephen, emacs-pretest-bug

    Sentence tokenization is a known problem. You can throw machine  
    learning algorithms at it, but that's not a viable option in our case.  
    However, Grefenstette&Tapanainen (1994) examined this in detail for  
    English, using the Brown corpus. They basically say that using a small  
    lexicon of common abbreviations, they can classify 99.1% of all  
    periods correctly. Even without the lexicon, you can achieve 97.7%  
    accuracy (on English) using the right regular expressions, and I think  
    this will be similar for other languages as well. I think that's good  
    enough for M-e and M-a.

    http://citeseer.ist.psu.edu/grefenstette94what.html

I encourage someone to implement this; then we will see how well it
works.  If it works well, we could set sentence-end-double-space to
nil for languages where this feature makes it an improvement.





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-14 16:04               ` Miles Bader
@ 2008-02-15  5:48                 ` Jonathan Rockway
  0 siblings, 0 replies; 24+ messages in thread
From: Jonathan Rockway @ 2008-02-15  5:48 UTC (permalink / raw)
  To: emacs-pretest-bug

* On Thu, Feb 14 2008, Miles Bader wrote:
> David Reitter <david.reitter@gmail.com> writes:
>> If my Aquamacs statistics (as attached) are a representative sample,
>> about half the Emacs users are located outside the US.  (And I'm  pretty
>> sure that there is a very strong Japanese population that skews  this
>> even further).  How many of these commonly use double-spacing?
>
> Japanese doesn't use spacing at all, really...

Incidentally, M-a and M-e work as expected (with `。' separating
sentences).

Regards,
Jonathan Rockway




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: paragraphs.el: do forward-sentence and friends not work?
  2008-02-14 14:43           ` Stefan Monnier
  2008-02-14 15:52             ` David Reitter
@ 2008-06-13 14:14             ` David Reitter
  1 sibling, 0 replies; 24+ messages in thread
From: David Reitter @ 2008-06-13 14:14 UTC (permalink / raw)
  To: Emacs-Devel devel

[-- Attachment #1: Type: text/plain, Size: 2877 bytes --]

On 14 Feb 2008, at 14:43, Stefan Monnier wrote:

>>> Using two spaces after end of sentence enables Emacs to distinguish
>>> between periods that end sentences and periods for abbreviations.
>>> That is why it should be the default.
>
>> We can improve this to make it work without depending on the double-
>> space.
>
> But the period-single-space vs period-double-space distinction  
> allows us
> to get it right 100% in many more languages than just English.
>
>
>        Stefan "Who switched to non-French spacing even when writing  
> French"


Following up on this discussion: one could arrive at the solution  
below, which does NOT change the default of `sentence-end-double- 
space' (it is t), but introduces a customization variable that allows  
users to configure the behavior for recognition only.

By default it is nil, which allows Emacs to recognize sentence ends  
even if the period is followed by only one space, as is common in many  
languages.
Would this have ill effects?  Does `fill-nobreak-p' (in fill.el) need  
to respect this variable as well?




*** lisp/textmodes/paragraphs.el	17 Apr 2008 10:52:44 +0100	1.87.2.4
--- lisp/textmodes/paragraphs.el	13 Jun 2008 15:04:21 +0100	
***************
*** 130,135 ****
--- 130,148 ----
     :group 'fill)
   ;;;###autoload(put 'sentence-end-double-space 'safe-local-variable  
'booleanp)

+ (defcustom sentence-end-double-space-for-recognition nil
+   "Non-nil means a single space does not end a sentence.
+ This is relevant for the recognition of sentence ends.  See also
+ `sentence-end-without-period' and `colon-double-space'.  If non-nil,
+ the value of `sentence-end-double-space' is used.
+
+ This value is used by the function `sentence-end' to construct the
+ regexp describing the end of a sentence, when the value of the  
variable
+ `sentence-end' is nil.  See Info node `(elisp)Standard Regexps'."
+   :type 'boolean
+   :group 'fill)
+ ;;;###autoload(put 'sentence-end-double-space-for-recognition 'safe- 
local-variable 'booleanp)
+
   (defcustom sentence-end-without-period nil
     "Non-nil means a sentence will end without a period.
   For example, a sentence in Thai text ends with double space but
***************
*** 188,194 ****
         (concat (if sentence-end-without-period "\\w  \\|")
   	      "\\("
   	      sentence-end-base
!               (if sentence-end-double-space
                     "\\($\\| $\\|\t\\|  \\)" "\\($\\|[\t ]\\)")
                 "\\|[" sentence-end-without-space "]+"
   	      "\\)"
--- 201,208 ----
         (concat (if sentence-end-without-period "\\w  \\|")
   	      "\\("
   	      sentence-end-base
!               (if (and sentence-end-double-space
! 		       sentence-end-double-space-for-recognition)
                     "\\($\\| $\\|\t\\|  \\)" "\\($\\|[\t ]\\)")
                 "\\|[" sentence-end-without-space "]+"
   	      "\\)"


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2193 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2008-06-13 14:14 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-13 17:08 paragraphs.el: do forward-sentence and friends not work? David Reitter
2008-02-13 17:27 ` Andreas Schwab
2008-02-13 17:32   ` David Reitter
2008-02-13 20:00     ` Stephen J. Turnbull
2008-02-14  4:42       ` Richard Stallman
2008-02-14  9:45         ` David Reitter
2008-02-14 14:22           ` Robert J. Chassell
2008-02-14 14:43           ` Stefan Monnier
2008-02-14 15:52             ` David Reitter
2008-02-14 16:04               ` Miles Bader
2008-02-15  5:48                 ` Jonathan Rockway
2008-06-13 14:14             ` David Reitter
2008-02-15  0:02           ` Richard Stallman
2008-02-14  9:10       ` David Reitter
2008-02-14  9:22         ` Miles Bader
2008-02-14  9:46           ` David Reitter
2008-02-14 10:07             ` Miles Bader
2008-02-14 10:44         ` Stephen J. Turnbull
2008-02-14 12:27           ` David Reitter
2008-02-14 22:25             ` Stephen J. Turnbull
2008-02-13 20:36     ` Stefan Monnier
2008-02-13 20:52       ` Thorsten Bonow
2008-02-13 23:06     ` Miles Bader
2008-02-14  2:18     ` Robert J. Chassell

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).