bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS


From: Eli Zaretskii
Subject: bug#23097: 24.5; ispell.el: lines with both CASECHARS and NOT-CASECHARS get sent to the spell checker
Date: Mon, 17 Aug 2020 19:40:58 +0300

> From: Nikolay Kudryavtsev <nikolay.kudryavtsev@gmail.com>
> Cc: 23097@debbugs.gnu.org
> Date: Mon, 17 Aug 2020 12:20:08 +0300
> 
> This is not an external software bug, but very much an Emacs bug.
> 
> I'm not sure what was the initial design idea for CASECHARS and 
> NOT-CASECHARS, but whatever it was, it would not work effectively due to 
> feeding the entire line. The most obvious practical use for them(being 
> able to spellcheck languages with completely different alphabets without 
> the spellchecker misfiring on either pass) would not work either.

The original design was that a spell-checker supports a single
language, and any text in other languages is a spelling mistake.  This
is still true for Ispell and for Aspell; only Hunspell (and Enchant,
when it uses Hunspell as its back-end) supports multiple languages.
With Hunspell, ispell.el effectively ignores CASECHARS and
NOT-CASECHARS, and instead uses the character set specified by the
dictionary file itself.

This is the only multi-dictionary spell-checking configuration that
ispell.el currently supports.  Which is why, when you first reported
this, I asked you why you couldn't use Hunspell; your answer, which
described some kind of failure related to encoding, I couldn't
understand then and I don't understand now (primarily because that
feature works for me).

Instead, you seem to insist on using Aspell in a way that to me sounds
like a kludge: spell-check the region with one dictionary, then
restart ispell.el with another dictionary and spell-check the same
region again.  AFAIU, you'd like ispell.el to support this kind of
workaround OOTB.  Is that correct, or did I miss something?

If my understanding is correct, then, apart of being a kludgey
solution for a problem that has a much cleaner one, I don't think I
understand how this could work well in general.  Suppose you have in
your buffer a mis-spelled word such as this:

   fooЫbar

with the Cyrillic letter being there by accident: perhaps you
unintentionally pressed a key when you shouldn't have.  Or imagine the
following typo:

   fooбар

which could happen if you forgot to switch the input method.

With your proposed mode of operation, the spell-checker will check
partial words and decide that in both cases there's no spelling
mistakes here, because each partial word is spelled correctly in its
language.  But clearly these are typos that need to be flagged.

Thus, just using 2 sets of characters is not enough to handle these
typos intelligently, as you'd get a lot of false negatives.

So even if we consider your report as a feature request, it is not
entirely clear to me how to implement such a feature.  And frankly,
since at least one spell-checker exists which supports multiple
dictionaries, it is not clear to me why we should try so hard forcing
Aspell look as if it did, too.

> The ideal pratical fix for this should spellcheck such lines word by word.

I think I show above why such simplistic strategy will backfire by
leaving some typos undetected.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]