From: Visuwesh <visuweshm@gmail.com>
To: "André A. Gomes" <andremegafone@gmail.com>
Cc: Lars Ingebrigtsen <larsi@gnus.org>,
56844@debbugs.gnu.org, Juri Linkov <juri@linkov.net>
Subject: bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate corner case.
Date: Tue, 02 Aug 2022 18:18:18 +0530 [thread overview]
Message-ID: <87czdi3j71.fsf@gmail.com> (raw)
In-Reply-To: <8735eezx8x.fsf@gmail.com> ("André A. Gomes"'s message of "Tue, 02 Aug 2022 14:43:42 +0300")
[செவ்வாய் ஆகஸ்ட் 02, 2022] André A. Gomes wrote:
> Juri Linkov <juri@linkov.net> writes:
>
>>>> It now gracefully handles the case when abbreviations such as e.g. or
>>>> i.e. are used in sentences.
>>>
>>> [...]
>>>
>>>> + (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
>>>
>>> I'm not quite sure I understand this patch. Are you changing this to
>>> only consider punctuation that's followed by an upper-case character to
>>> be sentence-end punctuation?
>>
>> It would be better to add such heuristics to repunctuate-sentences-filter,
>> so anyone could customize it.
>
> In general I'd agree with you, but this patch is actually fixing a bug,
> not introducing a personal preference. That's how I see it at least.
This breaks repunctuate-sentences for languages that don't have the
concept of upper and lower case characters. Try repunctuate-sentences
with and without your patch for the following text,
தொழிற்சாலை யந்திரங்கள் தேவையான மட்டும் அந்தத் தொழிலாளர்களது சக்தியை உறிஞ்சித்
தீர்த்துவிடுவதோடு அந்த நாள் விழுங்கப்பட்டுவிடும். எந்தவிதமான எச்சமிச்சங்களும் இல்லாமல்
அன்றையப் பொழுது அழிந்து கழியும்; மனிதனும் தனது சவக்குழியை நோக்கி ஓரடி
முன்னேறிவிடுவான். ஆனால் இப்போதோ ஒய்வின்
next prev parent reply other threads:[~2022-08-02 12:48 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-30 18:06 bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate corner case André A. Gomes
2022-07-31 8:34 ` Lars Ingebrigtsen
2022-07-31 19:49 ` Juri Linkov
2022-08-02 11:43 ` André A. Gomes
2022-08-02 12:48 ` Visuwesh [this message]
2022-08-02 11:41 ` André A. Gomes
2022-08-02 11:45 ` Lars Ingebrigtsen
2022-08-02 12:10 ` Robert Pluim
2022-08-02 12:35 ` Stefan Kangas
2022-08-02 19:59 ` Juri Linkov
2022-09-02 10:47 ` Lars Ingebrigtsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87czdi3j71.fsf@gmail.com \
--to=visuweshm@gmail.com \
--cc=56844@debbugs.gnu.org \
--cc=andremegafone@gmail.com \
--cc=juri@linkov.net \
--cc=larsi@gnus.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.