all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Lynn Winebarger <owinebar@gmail.com>
To: Richard Stallman <rms@gnu.org>
Cc: emacs-devel <emacs-devel@gnu.org>
Subject: Re: Grammar checking
Date: Thu, 6 Apr 2023 08:29:15 -0400	[thread overview]
Message-ID: <CAM=F=bAvFMMDoB_5Z2V6ccz3uoJc_iSGy3+9Z06FftvKHFoY7g@mail.gmail.com> (raw)
In-Reply-To: <E1pjAVt-0004D3-Dz@fencepost.gnu.org>

[-- Attachment #1: Type: text/plain, Size: 6984 bytes --]

On Sun, Apr 2, 2023, 11:05 PM Richard Stallman <rms@gnu.org> wrote:

> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
>   > > If the released (and free) LanguageTool _program_ gives adequate
>   > > results, we could make Emacs support working with that.  But we
> should
>   > > take pains _not_ to support the kind of communication that that SaaSS
>   > > server offers.
>   > They may not make it easy, see this complaint on their forum:
>
> Would you please spell out what it is
> that they "may not make easy"?
>
>   >
> https://forum.languagetool.org/t/about-the-premium-version-of-languagetool/8469
>
> I looked at that page, but lacking the context, I can't understand it
> well enough to divine the point that your message hints at.
>


I may have been mistaken in my first reading. I read the message as saying
that any process using the free service would receive an advertisement of
how many corrections would be found by the premium service.  I am assuming
that at the least emacs maintainers would want to filter that out by
default. The forum message may only refer to using the web user interface
for checking sample text, though.

If the former is true though, it could be difficult to ensure such
advertising is always filtered. It really depends on the owners of that
service, who can change over time.


 > * The process for contributing "rules" to the free version is to go
>   > through the SaaSS's forum sites.
>   > https://community.languagetool.org/rule/list?lang=en shows 5919 rules
>   > for english, presumably in the basic version.
>

I found a more on-point reference addressing my concern, i.e. how will
contributions replicating the rules implemented in the premium version be
treated by the project developers:

https://forum.languagetool.org/t/free-lt-premium-for-contributors/8639

Since the exact nature of those premium rules is presumably not disclosed
just by virtue of having a premium subscription, I can only guess this
reverse engineering would happen by following a process like:

1) Take a large corpus of texts with known grammatical errors, e.g.
https://www.cl.cam.ac.uk/research/nl/bea2019st/ or
https://ai.googleblog.com/2021/08/the-c4200m-synthetic-dataset-for.html?m=1
2) Record the results produced by the free and premium versions on each
test case
3) Formulate rules that specifically fix issues found by the premium
version and not the free version.

Perhaps the LanguageTool.org owners would consider this a violation of
their service's terms and conditions as a justification for not accepting
contributions of source code to the project.

OTOH, if an emacs developer or user simply wants to systematically improve
the free version of LanguageTool, the most obvious method for doing so
would be

1) Take a large corpus of texts with known grammatical errors, see above
2) Record the results produced by the free rule set
3) Formulate rules that specifically fix issues found, prioritizing issues
by some measure of expected frequency in real text

Presumably the additional rules in the premium version have been added
precisely according to some measure of their expected frequency, possibly
by analysis of real-world text from users over the years the service has
been available.

It would be surprising if any attempt to systematically improve my the
rules in LanguageTool did not have significant overlap with the rules found
in the premium version, if that attempt was successful, just due to the
definition of "successful" in statistical terms and the assumption that the
premium rule set is likewise "successful".

We could consider forking that code in a limited way: adding new rules.
>
> In general, we should cooperate with upstream developers, but we don't
> have to jump through hoops to do so.
>

I'm not personally very pure in the software I use, so I'm surprised at how
much the issues I perceive seem to bother me.  I've been an emacs user
since the 90s, and it would never have occurred to me that I would ever be
concerned about contributing code to improve emacs, whether directly to the
emacs projects, or indirectly through one of its dependencies.  From what I
see now, that will not be the case if grammar checking support is added
that depends on languagetool.

I suppose there's another, even more abstract concern with open source
software that is developed specifically in conjunction with a SaaSS
business, which is:  To what extent does data from users of the SaaSS drive
development, or even get incorporated in some (aggregated or statistical)
form in the source code.  For example, what if a grammar checker
incorporated a "deep learning" system that had been trained on such data.
In most cases, it would be impossible to reconstruct the training data set
starting from the data specifying the trained model.  But, would it be
acceptable for a GNU software project to depend on such software?  I don't
know the answer, but I think it's a real question when dealing with open
source software from projects like LanguageTool.  I also don't know or
allege that there's anything like that in LanguageTool, but neither can I
be certain that there is not.  I can't help but think this business model -
maintaining an open source version as a loss leader for a proprietary or
SaaSS version - is only going to continue growing, and hence the need to
address it in the GNU coding manual section 8 or otherwise.


>   >    Looking at the java code makes it appear there are
>   > many hard-coded rules, but I don't know if that is really the case.
>   > That is whether the code for the rules are some generic implementation
>   > of the rules coded in XML, or if the XML rule sets are being
>   > translated into java code at some point in the build process.
>
> I can only guess at the context this is about, but it sounds like
> you're suggesting that it may not be clear what form of the code is
> the real source code.  Do they not say?  Does their source release
> include the XML?  Does it include Make rules to translate the XML into
> Java?


I don't do a lot of Java coding, and it was a cursory examination.  I did
eventually find the xml rulesets linked to from
https://dev.languagetool.org/languages, which is classified as "user
documentation". It appears most rules in well-supported languages are in
XML, with some coded in Java.  Whether the coding in Java is for speed or
to overcome limitations of the semantics of rules expressed in XML, I have
no idea.

I'm going to leave my concerns at that.  I've already spent too much time
on this as it is.  I just thought the last-minute hair-pulling discussion
of tree-sitter grammar files, which frankly seem to have much less ethical
baggage, should not be repeated after grammar checking support depending on
LanguageTool is already implemented and adopted.

Lynn

[-- Attachment #2: Type: text/html, Size: 9675 bytes --]

  reply	other threads:[~2023-04-06 12:29 UTC|newest]

Thread overview: 128+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-29  3:00 jinx Richard Stallman
2023-03-29  9:02 ` jinx Philip Kaludercic
2023-03-31  4:29   ` jinx Richard Stallman
2023-03-31  7:15     ` jinx Philip Kaludercic
2023-04-01  3:11       ` jinx Richard Stallman
2023-04-01  6:01         ` jinx Eli Zaretskii
2023-04-01 12:43           ` jinx Peter Oliver
2023-04-01 13:02             ` jinx Eli Zaretskii
2023-04-01 13:21               ` jinx Peter Oliver
2023-04-01  8:54         ` jinx Augusto Stoffel
2023-03-29 22:46 ` jinx Michael Eliachevitch
2023-03-30  1:02   ` jinx João Pedro
2023-03-30  5:23   ` jinx Eli Zaretskii
2023-03-31  4:29   ` jinx Richard Stallman
2023-03-31  6:51     ` jinx Eli Zaretskii
2023-03-31  7:10       ` jinx Gregory Heytings
2023-03-31  7:15         ` Grammar checking (was: jinx) Eli Zaretskii
2023-03-31  7:47           ` Grammar checking Philip Kaludercic
2023-03-31  8:09             ` Gregory Heytings
2023-03-31  8:38               ` Philip Kaludercic
2023-03-31  9:02                 ` Gregory Heytings
2023-03-31 11:37               ` Lynn Winebarger
2023-03-31 12:01                 ` Gregory Heytings
2023-03-31 12:45                   ` Peter Oliver
2023-03-31 15:29                     ` Philip Kaludercic
2023-03-31 17:00                       ` Peter Oliver
2023-03-31 12:54               ` Peter Oliver
2023-03-31 13:09                 ` Gregory Heytings
2023-03-31 11:23             ` Eli Zaretskii
2023-03-31 12:12               ` Peter Oliver
2023-03-31 15:25               ` Philip Kaludercic
2023-03-31  8:40           ` Nasser Alkmim
2023-03-31  8:45             ` Michael Eliachevitch
2023-03-31 13:44               ` Felician Nemeth
2023-03-31 16:03               ` Peter Oliver
2023-03-31  8:48             ` Gregory Heytings
2023-04-01 12:59               ` Lynn Winebarger
2023-04-01 13:18                 ` Gregory Heytings
2023-04-01 13:37                 ` Eli Zaretskii
2023-04-01 17:30                   ` Lynn Winebarger
2023-04-01 17:35                     ` Eli Zaretskii
2023-04-02  3:12                 ` Richard Stallman
2023-04-02 15:24                   ` Lynn Winebarger
2023-04-03  3:05                     ` Richard Stallman
2023-04-03  3:05                     ` Richard Stallman
2023-04-06 12:29                       ` Lynn Winebarger [this message]
2023-04-08  3:28                         ` Richard Stallman
2023-04-08 13:33                           ` Lynn Winebarger
2023-04-08 13:23                             ` Eli Zaretskii
2023-04-08  3:28                         ` Richard Stallman
2023-04-08 15:20                           ` Lynn Winebarger
2023-04-19  5:13                             ` Richard Stallman
2023-04-09  9:02                           ` Philip Kaludercic
2023-04-09 12:31                             ` Lynn Winebarger
2023-04-22  2:22                               ` Richard Stallman
2023-04-23  2:25                                 ` Richard Stallman
2023-04-23 14:14                                 ` Lynn Winebarger
2023-04-08  3:28                         ` Richard Stallman
2023-04-08 14:23                           ` Lynn Winebarger
2023-03-31 10:59             ` Eli Zaretskii
2023-04-02  3:11               ` Richard Stallman
2023-04-02  3:40                 ` Emanuel Berg
2023-03-31 16:20           ` Grammar checking (was: jinx) João Távora
2023-04-05 13:05         ` jinx Rudolf Adamkovič
2023-04-05 18:37           ` jinx Philip Kaludercic
2023-03-31 18:33       ` jinx Arash Esbati
2023-03-31 19:11         ` jinx Eli Zaretskii
2023-03-31 19:35           ` jinx Arash Esbati
2023-04-01  7:20             ` jinx Eli Zaretskii
2023-04-01  7:42               ` jinx Arash Esbati
2023-04-01  8:13                 ` jinx Eli Zaretskii
2023-04-02 11:29                   ` jinx Arash Esbati
2023-04-03 12:32                   ` jinx Michael Heerdegen
2023-04-03 13:51                     ` jinx Michael Eliachevitch
2023-04-03 14:26                     ` jinx Eli Zaretskii
2023-04-03 15:13                       ` jinx Michael Eliachevitch
2023-04-04  2:56                         ` jinx Richard Stallman
2023-04-04 12:27                           ` jinx Michael Heerdegen
2023-04-05  2:35                             ` jinx Richard Stallman
2023-04-05  9:02                               ` jinx Philip Kaludercic
2023-04-05 10:51                                 ` jinx Michael Heerdegen
2023-04-05 11:25                                   ` jinx Michael Heerdegen
2023-04-05 11:55                                     ` jinx Eli Zaretskii
2023-04-05 13:17                                       ` jinx Michael Heerdegen
2023-04-05  2:34                           ` jinx Richard Stallman
2023-04-05  7:58                             ` jinx Po Lu
2023-04-05  8:01                             ` jinx Arash Esbati
2023-04-05  8:15                               ` jinx Emanuel Berg
2023-04-01 13:11               ` jinx Lynn Winebarger
2023-04-01  8:32             ` jinx Augusto Stoffel
2023-04-01  8:29         ` jinx Augusto Stoffel
2023-04-01 11:21           ` jinx Eli Zaretskii
2023-04-01 11:39             ` jinx Augusto Stoffel
2023-04-01 11:54               ` jinx Eli Zaretskii
2023-04-01 12:32                 ` jinx Augusto Stoffel
2023-04-01 12:57                   ` jinx Eli Zaretskii
2023-04-01  3:11       ` jinx Richard Stallman
2023-04-01  5:56         ` jinx Eli Zaretskii
2023-04-01  8:35           ` jinx Augusto Stoffel
2023-04-01  8:25       ` jinx Emanuel Berg
  -- strict thread matches above, loose matches on Subject: below --
2023-03-31  7:46 Grammar checking Payas Relekar
2023-03-31 11:20 ` Eli Zaretskii
2023-03-31 12:55   ` Ihor Radchenko
2023-03-31 13:11     ` Eli Zaretskii
2023-03-31 13:29       ` Ihor Radchenko
2023-03-31 14:19         ` Eli Zaretskii
2023-04-02  3:11       ` Richard Stallman
2023-03-31 12:59   ` Gregory Heytings
2023-03-31 13:20     ` Eli Zaretskii
2023-03-31 13:59       ` Gregory Heytings
2023-03-31 14:28         ` Eli Zaretskii
2023-03-31 14:24       ` Ihor Radchenko
2023-03-31 14:39         ` Eli Zaretskii
2023-04-01  6:52           ` Ihor Radchenko
2023-04-01  7:00             ` Eli Zaretskii
2023-04-01  7:10               ` Ihor Radchenko
2023-04-01  7:14                 ` Eli Zaretskii
2023-04-01 13:09             ` Peter Oliver
2023-04-01 13:17               ` Ihor Radchenko
2023-04-01 13:24                 ` Peter Oliver
2023-04-01 13:32                   ` Ihor Radchenko
2023-04-01 13:42                     ` Eli Zaretskii
2023-04-01 13:41                   ` Eli Zaretskii
2023-04-03 13:02                     ` Peter Oliver
2023-03-31 10:03 Payas Relekar
2023-04-03  5:52 grammar checking Pedro Andres Aranda Gutierrez
2023-04-03 13:51 ` Eli Zaretskii
2023-04-03 15:01   ` Pedro Andres Aranda Gutierrez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM=F=bAvFMMDoB_5Z2V6ccz3uoJc_iSGy3+9Z06FftvKHFoY7g@mail.gmail.com' \
    --to=owinebar@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=rms@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.