all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Reuben Thomas <rrt@sc3d.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 17742@debbugs.gnu.org
Subject: bug#17742: Acknowledgement (Support for enchant?)
Date: Tue, 20 Dec 2016 21:43:32 +0000	[thread overview]
Message-ID: <CAOnWdogQhcrpnbNU9qTbOZekL9fx_BhixFhONuNFFOYuiXU9HQ@mail.gmail.com> (raw)
In-Reply-To: <834m1y4nj7.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 5818 bytes --]

On 20 December 2016 at 15:40, Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Reuben Thomas <rrt@sc3d.org>
> > Date: Mon, 19 Dec 2016 21:47:42 +0000
> > Cc: 17742@debbugs.gnu.org
> >
> >     neither GNU Aspell nor hunspell offer any way to get this
> information (about character classes of dictionaries) via their APIs.
> >
> >     They provide this information in the dictionaries, and we glean it
> >     from there.  See ispell-parse-hunspell-affix-file and
> >     ispell-aspell-find-dictionary.
> >
> > ​The dictionaries are not part of the API (even where the format is
> documented, the location may not be fixed), so it's not a good idea to rely
> on them.
>
> If there's no better way, then I see no problem in relying on the
> dictionaries, and de-facto the results are satisfactory.
>

​Agreed.​


> > ​Having discovered that Aspell does not provide this information (I
> checked again, and ispell-aspell-find-dictionary does not find this
> information in the dictionaries, except for limited information about
> otherchars; for casechars and not-casechars it defaults to [:alpha:]), I
> shall investigate with the hunspell maintainers.​
>
> Aspell provides some of that, and there's no reason to ignore what it
> does provide.
>

​Agreed.​

​​
> Whether it's good enough depends on the dictionary and on what "(XP)"
> means.  It could be that "(XP)", including the parentheses, is a word
> the dictionary recognizes, something akin to "(C)", i.e. copyright
> sign.


​Thanks, that's a good example.​

So, if "(C)" is in the dictionary, then with [:graph:] as casechars, if I
run ispell-word with point anywhere in "(C)", Emacs will send "(C)", and it
will come back as correct. If casechars were only [:alpha:], then Emacs
would send "C", and it would come back as wrong.

Conversely, if "C" is in the dictionary, then if I run ispell-word with
casechars set to [:graph:] then Emacs will send "(C)" and it will come back
as correct (because Hunspell will ignore the non-wordchars characters). It
would also work with casechars set to [:alpha:].

So with casechars set to [:graph:], there's no false positive or false
negative.


> I don't see why it would be fragile with Enchant when it isn't with
> its back-ends.


​Because there's no guarantee that Enchant will continue to use backends in
the same way as at present.​

  And avoiding even fragile methods is worse than using
> them, when there's no better way of gleaning the same information, and
> the information is important (as it is in this case).
>

​Agreed.


> I think you are drawing too radical conclusions from trying that with
> a single word and a single dictionary.  Which string was sent to the
> speller in this case,


"​(XP)"​

and is that the string you expected to be sent?
>

​I don't have strong feelings about that.​

>     Moreover, even when we send entire lines to the speller, we want to
> >     skip lines that include only non-word characters.
> >
> > ​Why?​
>
> To avoid false positives and false negatives, as explained above.
>

​But such characters will be ignored by the spellchecker (unless perhaps
they occur in the personal word list). So I'm not sure how they would
generate false positives or negatives.​

First, Enchant could be using Hunspell as its engine, right?
>

​Sure.​

And second, AFAIU this discussion started by you proposing to get rid
> of CASECHARS etc., for all spellers, not just for Enchant, something
> that will definitely cause degradation.
>

​I didn't mean to propose that. I'm sorry if I gave that impression.​ I'm
just saying I don't want to put in the work now to add that support for
Enchant. I have not changed (and do not propose to change) the support for
Hunspell.

It sounds like the important part of our disagreement is in the last
> sentence.  If so, I hope I've succeeded to change your mind.  Failing
> that, all I can suggest is to study the spelling rules of modern
> speller, such as Hunspell, and see how this information is used there.
>

​As I already said, Hunspell does not provide this information to
applications. So consumers of Hunspell have two choices:

1. Use side channels (as Emacs does).

2. Have some arbitrary idea of what constitutes a word.

The fact that an API to get the wordchars from hunspell is only now being
considered for addition suggests to me that neither the maintainers of
hunspell nor the developers of hunspell-using programs have thought this
was particularly important.​

I tried to explain that above: you will get falses and/or irrelevant
> or missing corrections from the speller.  For example, if you send
> "foo.bar", and the speller doesn't support '.' as a word-constituent
> character, you will get separate suggestions for "foo" and "bar", and
> won't get "foobar".
>

What happens at the moment (with my Enchant patch) is I get the error
"Ispell and its process have different character maps". I wouldn't expect
"foobar" in any case, if "." is not a constituent character, though I might
be surprised to get a correction for a word I thought I wasn't pointing at
(but I could be surprised in this way in any case, if the dictionary has a
surprising set of wordchars).


> I also don't understand why you want to remove this information, that
> is already there, is not harder to get with Enchant than it is without
> it, and the code which supports it is already there?
>

​I'm not proposing to remove this information. I am proposing not to add it
for Enchant yet (because that will require extra work and code), and I am
hoping to end up with a simpler way to get it, via the API.

-- 
http://rrt.sc3d.org

[-- Attachment #2: Type: text/html, Size: 10148 bytes --]

  reply	other threads:[~2016-12-20 21:43 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-10  0:01 bug#17742: Support for enchant? Reuben Thomas
2014-09-15 11:06 ` bug#17742: Limitations of enchant Reuben Thomas
2016-12-02  0:15   ` Reuben Thomas
     [not found] ` <handler.17742.B.140235850213377.ack@debbugs.gnu.org>
2016-12-05 21:41   ` bug#17742: Acknowledgement (Support for enchant?) Reuben Thomas
2016-12-06 15:55     ` Eli Zaretskii
2016-12-06 15:56       ` Reuben Thomas
2016-12-13  0:53         ` Reuben Thomas
2016-12-13 16:37           ` Eli Zaretskii
2016-12-13 18:26             ` Reuben Thomas
2016-12-13 18:54               ` Eli Zaretskii
2016-12-13 21:17                 ` Reuben Thomas
2016-12-13 21:30                   ` Reuben Thomas
2016-12-14 15:42                   ` Eli Zaretskii
2016-12-15 12:36                     ` Reuben Thomas
2016-12-18 23:39                 ` Reuben Thomas
2016-12-19  1:02                   ` Reuben Thomas
2016-12-19 12:41                     ` Reuben Thomas
2016-12-19 16:01                   ` Eli Zaretskii
2016-12-19 17:37                     ` Agustin Martin
2016-12-19 18:09                       ` Eli Zaretskii
2016-12-19 21:21                         ` Reuben Thomas
2016-12-19 21:27                       ` Reuben Thomas
2016-12-20 15:38                         ` Eli Zaretskii
2016-12-19 21:47                     ` Reuben Thomas
2016-12-19 22:04                       ` Reuben Thomas
2016-12-20 15:40                         ` Eli Zaretskii
2016-12-20 15:40                       ` Eli Zaretskii
2016-12-20 21:43                         ` Reuben Thomas [this message]
2016-12-21 17:13                           ` Eli Zaretskii
2016-12-21 17:32                             ` Reuben Thomas
2017-08-09 11:35                               ` Reuben Thomas
2017-08-18  8:54                                 ` Eli Zaretskii
2017-08-20 13:02                                   ` Reuben Thomas
2017-08-20 14:42                                     ` Eli Zaretskii
2017-08-20 14:50                                       ` Reuben Thomas
2017-08-20 19:34                                         ` Eli Zaretskii
2017-08-20 20:36                                           ` Reuben Thomas
2017-08-20 14:50 ` bug#17742: Reuben Thomas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOnWdogQhcrpnbNU9qTbOZekL9fx_BhixFhONuNFFOYuiXU9HQ@mail.gmail.com \
    --to=rrt@sc3d.org \
    --cc=17742@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.