unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate corner case.
@ 2022-07-30 18:06 André A. Gomes
  2022-07-31  8:34 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 11+ messages in thread
From: André A. Gomes @ 2022-07-30 18:06 UTC (permalink / raw)
  To: 56844

[-- Attachment #1: Type: text/plain, Size: 616 bytes --]

Tags: patch

Hi Emacs,

Please find the patch below.



In GNU Emacs 29.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.30, cairo version 1.16.0)
Windowing system distributor 'The X.Org Foundation', version 11.0.12101004
System Description: Guix System

Configured using:
 'configure
 CONFIG_SHELL=/gnu/store/4y5m9lb8k3qkb1y9m02sw9w9a6hacd16-bash-minimal-5.1.8/bin/bash
 SHELL=/gnu/store/4y5m9lb8k3qkb1y9m02sw9w9a6hacd16-bash-minimal-5.1.8/bin/bash
 --prefix=/gnu/store/7a6fnkqrxb0chmvj63f7ddr6wg3pq9g5-emacs-next-29.0.50-1.0a5477b
 --enable-fast-install --with-modules --with-cairo
 --disable-build-details'


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Refactor-repunctuate-sentences-to-accommodate-corner.patch --]
[-- Type: text/patch, Size: 3795 bytes --]

From c57f51b7bfec3e3e5c9c2f680d7936c3e546bb28 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Andr=C3=A9=20A=2E=20Gomes?= <andremegafone@gmail.com>
Date: Sat, 30 Jul 2022 21:01:38 +0300
Subject: [PATCH] Refactor repunctuate-sentences to accommodate corner case.

It now gracefully handles the case when abbreviations such as e.g. or
i.e. are used in sentences.
---
 lisp/textmodes/paragraphs.el            | 32 +++++++++++++------------
 test/lisp/textmodes/paragraphs-tests.el |  5 ++--
 2 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/lisp/textmodes/paragraphs.el b/lisp/textmodes/paragraphs.el
index cd726ad4776..89624b66318 100644
--- a/lisp/textmodes/paragraphs.el
+++ b/lisp/textmodes/paragraphs.el
@@ -506,29 +506,31 @@ It is advised to use `add-function' on this to add more filters,
 for example, `(looking-back (rx (or \"e.g.\" \"i.e.\") \" \") 5)'
 with a set of predefined abbreviations to skip from adding two spaces.")
 
-(defun repunctuate-sentences (&optional no-query start end)
-  "Put two spaces at the end of sentences from point to the end of buffer.
-It works using `query-replace-regexp'.  In Transient Mark mode,
-if the mark is active, operate on the contents of the region.
-Second and third arg START and END specify the region to operate on.
-If optional argument NO-QUERY is non-nil, make changes without asking
-for confirmation.  You can use `repunctuate-sentences-filter' to add
-filters to skip occurrences of spaces that don't need to be replaced."
-  (interactive (list nil
-                     (if (use-region-p) (region-beginning))
-                     (if (use-region-p) (region-end))))
-  (let ((regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +")
-        (to-string "\\1\\2\\3  "))
+(defun repunctuate-sentences (&optional no-query)
+  "Put two spaces at the end of sentences.
+
+In Transient Mark mode, if the mark is active, operate on the
+contents of the region.  If optional argument NO-QUERY is
+non-nil, make changes without asking for confirmation.
+
+Use `repunctuate-sentences-filter' to add filters to skip
+occurrences of spaces that don't need to be replaced."
+  (interactive "P")
+  (let ((beg (if (use-region-p) (region-beginning) (point-min)))
+        (end (if (use-region-p) (region-end) (point-max)))
+        (case-fold-search nil)
+        (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
+        (to-string "\\1\\2\\3  \\4"))
     (if no-query
         (progn
-          (when start (goto-char start))
+          (goto-char beg)
           (while (re-search-forward regexp end t)
             (replace-match to-string)))
       (unwind-protect
           (progn
             (add-function :after-while isearch-filter-predicate
                           repunctuate-sentences-filter)
-            (query-replace-regexp regexp to-string nil start end))
+            (query-replace-regexp regexp to-string nil beg end))
         (remove-function isearch-filter-predicate
                          repunctuate-sentences-filter)))))
 
diff --git a/test/lisp/textmodes/paragraphs-tests.el b/test/lisp/textmodes/paragraphs-tests.el
index e54b459b20e..53735b4bf4b 100644
--- a/test/lisp/textmodes/paragraphs-tests.el
+++ b/test/lisp/textmodes/paragraphs-tests.el
@@ -101,10 +101,11 @@
 
 (ert-deftest paragraphs-tests-repunctuate-sentences ()
   (with-temp-buffer
-    (insert "Just. Some. Sentences.")
+    (insert "Just. Some. Sentences. Yet another, e.g. this one.")
     (goto-char (point-min))
     (repunctuate-sentences t)
-    (should (equal (buffer-string) "Just.  Some.  Sentences."))))
+    (should (equal (buffer-string)
+                   "Just.  Some.  Sentences.  Yet another, e.g. this one."))))
 
 (ert-deftest paragraphs-tests-backward-sentence ()
   (with-temp-buffer
-- 
2.37.1


[-- Attachment #3: Type: text/plain, Size: 61 bytes --]


-- 
André A. Gomes
"You cannot even find the ruins..."

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate corner case.
  2022-07-30 18:06 bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate corner case André A. Gomes
@ 2022-07-31  8:34 ` Lars Ingebrigtsen
  2022-07-31 19:49   ` Juri Linkov
  2022-08-02 11:41   ` André A. Gomes
  0 siblings, 2 replies; 11+ messages in thread
From: Lars Ingebrigtsen @ 2022-07-31  8:34 UTC (permalink / raw)
  To: André A. Gomes; +Cc: 56844

André A. Gomes <andremegafone@gmail.com> writes:

> It now gracefully handles the case when abbreviations such as e.g. or
> i.e. are used in sentences.

[...]

> +        (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")

I'm not quite sure I understand this patch.  Are you changing this to
only consider punctuation that's followed by an upper-case character to
be sentence-end punctuation?






^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate corner case.
  2022-07-31  8:34 ` Lars Ingebrigtsen
@ 2022-07-31 19:49   ` Juri Linkov
  2022-08-02 11:43     ` André A. Gomes
  2022-08-02 11:41   ` André A. Gomes
  1 sibling, 1 reply; 11+ messages in thread
From: Juri Linkov @ 2022-07-31 19:49 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: André A. Gomes, 56844

>> It now gracefully handles the case when abbreviations such as e.g. or
>> i.e. are used in sentences.
>
> [...]
>
>> +        (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
>
> I'm not quite sure I understand this patch.  Are you changing this to
> only consider punctuation that's followed by an upper-case character to
> be sentence-end punctuation?

It would be better to add such heuristics to repunctuate-sentences-filter,
so anyone could customize it.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate corner case.
  2022-07-31  8:34 ` Lars Ingebrigtsen
  2022-07-31 19:49   ` Juri Linkov
@ 2022-08-02 11:41   ` André A. Gomes
  2022-08-02 11:45     ` Lars Ingebrigtsen
  2022-08-02 12:35     ` Stefan Kangas
  1 sibling, 2 replies; 11+ messages in thread
From: André A. Gomes @ 2022-08-02 11:41 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 56844

Lars Ingebrigtsen <larsi@gnus.org> writes:

>> +        (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
>
> I'm not quite sure I understand this patch.  Are you changing this to
> only consider punctuation that's followed by an upper-case character to
> be sentence-end punctuation?

Yes.  The patch section relative to testing is illustrative:

--8<---------------cut here---------------start------------->8---
 (ert-deftest paragraphs-tests-repunctuate-sentences ()
   (with-temp-buffer
-    (insert "Just. Some. Sentences.")
+    (insert "Just. Some. Sentences. Yet another, e.g. this one.")
     (goto-char (point-min))
     (repunctuate-sentences t)
-    (should (equal (buffer-string) "Just.  Some.  Sentences."))))
+    (should (equal (buffer-string)
+                   "Just.  Some.  Sentences.  Yet another, e.g. this one."))))
--8<---------------cut here---------------end--------------->8---

Thanks.
 

-- 
André A. Gomes
"You cannot even find the ruins..."





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate corner case.
  2022-07-31 19:49   ` Juri Linkov
@ 2022-08-02 11:43     ` André A. Gomes
  2022-08-02 12:48       ` Visuwesh
  0 siblings, 1 reply; 11+ messages in thread
From: André A. Gomes @ 2022-08-02 11:43 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Lars Ingebrigtsen, 56844

Juri Linkov <juri@linkov.net> writes:

>>> It now gracefully handles the case when abbreviations such as e.g. or
>>> i.e. are used in sentences.
>>
>> [...]
>>
>>> +        (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
>>
>> I'm not quite sure I understand this patch.  Are you changing this to
>> only consider punctuation that's followed by an upper-case character to
>> be sentence-end punctuation?
>
> It would be better to add such heuristics to repunctuate-sentences-filter,
> so anyone could customize it.

In general I'd agree with you, but this patch is actually fixing a bug,
not introducing a personal preference.  That's how I see it at least.


-- 
André A. Gomes
"You cannot even find the ruins..."





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate corner case.
  2022-08-02 11:41   ` André A. Gomes
@ 2022-08-02 11:45     ` Lars Ingebrigtsen
  2022-08-02 12:10       ` Robert Pluim
  2022-08-02 12:35     ` Stefan Kangas
  1 sibling, 1 reply; 11+ messages in thread
From: Lars Ingebrigtsen @ 2022-08-02 11:45 UTC (permalink / raw)
  To: André A. Gomes; +Cc: 56844

André A. Gomes <andremegafone@gmail.com> writes:

>> I'm not quite sure I understand this patch.  Are you changing this to
>> only consider punctuation that's followed by an upper-case character to
>> be sentence-end punctuation?
>
> Yes.

I don't think that'll be a generally welcome change -- some people write
using non-standard orthography.  If this change is to be made, it has to
be optional.






^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate corner case.
  2022-08-02 11:45     ` Lars Ingebrigtsen
@ 2022-08-02 12:10       ` Robert Pluim
  0 siblings, 0 replies; 11+ messages in thread
From: Robert Pluim @ 2022-08-02 12:10 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: André A. Gomes, 56844

>>>>> On Tue, 02 Aug 2022 13:45:05 +0200, Lars Ingebrigtsen <larsi@gnus.org> said:

    Lars> André A. Gomes <andremegafone@gmail.com> writes:
    >>> I'm not quite sure I understand this patch.  Are you changing this to
    >>> only consider punctuation that's followed by an upper-case character to
    >>> be sentence-end punctuation?
    >> 
    >> Yes.

    Lars> I don't think that'll be a generally welcome change -- some people write
    Lars> using non-standard orthography.  If this change is to be made, it has to
    Lars> be optional.

It doesnʼt even have to be that non-standard. Consider.

   De deur sloeg open.  de Valk stond in het licht.

Thatʼs pedantically incorrect, but there are many people (myself
included) who think that certain grammarians should keep quiet 😀

Robert
-- 





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate corner case.
  2022-08-02 11:41   ` André A. Gomes
  2022-08-02 11:45     ` Lars Ingebrigtsen
@ 2022-08-02 12:35     ` Stefan Kangas
  2022-08-02 19:59       ` Juri Linkov
  1 sibling, 1 reply; 11+ messages in thread
From: Stefan Kangas @ 2022-08-02 12:35 UTC (permalink / raw)
  To: André A. Gomes, Lars Ingebrigtsen; +Cc: 56844

André A. Gomes <andremegafone@gmail.com> writes:

>> I'm not quite sure I understand this patch.  Are you changing this to
>> only consider punctuation that's followed by an upper-case character to
>> be sentence-end punctuation?
>
> Yes.

FWIW, I would rather want to specify a list of ignored abbreviations
that I'd like to not consider ending a sentence.  This could include
standard US ones like "e.g.", "i.e.", etc. by default, but should be
customizable so I can add any localized equivalents.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate corner case.
  2022-08-02 11:43     ` André A. Gomes
@ 2022-08-02 12:48       ` Visuwesh
  0 siblings, 0 replies; 11+ messages in thread
From: Visuwesh @ 2022-08-02 12:48 UTC (permalink / raw)
  To: André A. Gomes; +Cc: Lars Ingebrigtsen, 56844, Juri Linkov

[செவ்வாய் ஆகஸ்ட் 02, 2022] André A. Gomes wrote:

> Juri Linkov <juri@linkov.net> writes:
>
>>>> It now gracefully handles the case when abbreviations such as e.g. or
>>>> i.e. are used in sentences.
>>>
>>> [...]
>>>
>>>> +        (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
>>>
>>> I'm not quite sure I understand this patch.  Are you changing this to
>>> only consider punctuation that's followed by an upper-case character to
>>> be sentence-end punctuation?
>>
>> It would be better to add such heuristics to repunctuate-sentences-filter,
>> so anyone could customize it.
>
> In general I'd agree with you, but this patch is actually fixing a bug,
> not introducing a personal preference.  That's how I see it at least.

This breaks repunctuate-sentences for languages that don't have the
concept of upper and lower case characters.  Try repunctuate-sentences
with and without your patch for the following text,

தொழிற்சாலை யந்திரங்கள் தேவையான மட்டும் அந்தத் தொழிலாளர்களது சக்தியை உறிஞ்சித்
தீர்த்துவிடுவதோடு அந்த நாள் விழுங்கப்பட்டுவிடும். எந்தவிதமான எச்சமிச்சங்களும் இல்லாமல்
அன்றையப் பொழுது அழிந்து கழியும்; மனிதனும் தனது சவக்குழியை நோக்கி ஓரடி
முன்னேறிவிடுவான். ஆனால் இப்போதோ ஒய்வின்

^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate corner case.
  2022-08-02 12:35     ` Stefan Kangas
@ 2022-08-02 19:59       ` Juri Linkov
  2022-09-02 10:47         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 11+ messages in thread
From: Juri Linkov @ 2022-08-02 19:59 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: André A. Gomes, Lars Ingebrigtsen, 56844

> FWIW, I would rather want to specify a list of ignored abbreviations
> that I'd like to not consider ending a sentence.  This could include
> standard US ones like "e.g.", "i.e.", etc. by default, but should be
> customizable so I can add any localized equivalents.

Please see an example in the docstring of the variable
'repunctuate-sentences-filter'.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate corner case.
  2022-08-02 19:59       ` Juri Linkov
@ 2022-09-02 10:47         ` Lars Ingebrigtsen
  0 siblings, 0 replies; 11+ messages in thread
From: Lars Ingebrigtsen @ 2022-09-02 10:47 UTC (permalink / raw)
  To: Juri Linkov; +Cc: André A. Gomes, 56844, Stefan Kangas

Juri Linkov <juri@linkov.net> writes:

>> FWIW, I would rather want to specify a list of ignored abbreviations
>> that I'd like to not consider ending a sentence.  This could include
>> standard US ones like "e.g.", "i.e.", etc. by default, but should be
>> customizable so I can add any localized equivalents.
>
> Please see an example in the docstring of the variable
> 'repunctuate-sentences-filter'.

I think the conclusion here is that we don't want to change how
repunctuate-sentences work here, so I'm closing this bug report.





^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-09-02 10:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-30 18:06 bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate corner case André A. Gomes
2022-07-31  8:34 ` Lars Ingebrigtsen
2022-07-31 19:49   ` Juri Linkov
2022-08-02 11:43     ` André A. Gomes
2022-08-02 12:48       ` Visuwesh
2022-08-02 11:41   ` André A. Gomes
2022-08-02 11:45     ` Lars Ingebrigtsen
2022-08-02 12:10       ` Robert Pluim
2022-08-02 12:35     ` Stefan Kangas
2022-08-02 19:59       ` Juri Linkov
2022-09-02 10:47         ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).