lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Strip markup for spell checking?


From: Greg Chicares
Subject: Re: [lmi] Strip markup for spell checking?
Date: Thu, 29 Oct 2015 15:12:04 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.3.0

On 2015-10-28 17:19, Vadim Zeitlin wrote:
> On Wed, 28 Oct 2015 15:47:15 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> I'm looking for a way to check spelling in XSL files, excluding markup.
[...]
>  I don't know of any tools specifically for spell checking XSL, but I think
> any tool usable with XML should do and AFAIK many of them exist and aspell
> does seem to support this and so does hunspell.

Is one preferable to the other? Let's see...

https://wiki.ubuntu.com/ConsolidateSpellingLibs
| hunspell is the most modern implementation and considered the best choice
| in the free software world.

http://fedoraproject.org/wiki/Releases/FeatureDictionary
| Fix the proliferation of dictionaries in the OS.
...
| This is complete, all major applications and default GNOME/KDE spell
| checking now goes through hunspell.

https://lists.gnu.org/archive/html/aspell-announce/2011-09/msg00000.html
| For a long time I thought about ways to regain Aspell status as the
| standard system spell checker, but after giving it a lot of through I
| have decided that this goal that is no longer worth pursuing. ...
| I thought that ... I could ... convince Linux distributions to consider
| Aspell over Hunspell as the one true spell checker; however after many
| years, I finally decided that it wasn't worth it.

It seems clear that 'hunspell' wins, although maybe debian didn't get
that memo...

/home/greg[0]$uname --all
Linux turgon 3.2.0-4-amd64 #1 SMP Debian 3.2.63-2+deb7u2 x86_64 GNU/Linux
/home/greg[0]$whence aspell
/usr/bin/aspell
/home/greg[0]$whence hunspell
/home/greg[1]$aptitude show hunspell |head -2
Package: hunspell
State: not installed

...but I can just install it:

# apt-get install hunspell

>  But I'd just like to say how I do it, which is definitely very low
> technological but works well for me: I open the file in Vim and do "set
> spell", then use "]s" to go the next spelling error, correct it (usually by
> just pressing "1z=" to select the first suggested replacement), press "]s"
> again and so on.
> 
>  Unfortunately in this particular case, it doesn't work well out of the box

I seek an easy method for people who don't know vim.

>  After doing this I could spell check the entire file and, in addition to
> the typo you found, only found one other one and that one in a comment:
[...]
> -      <!-- The data to be diplayed in the pages, cover page first -->
> +      <!-- The data to be displayed in the pages, cover page first -->

That's been there ever since the file was added to svn:
  http://svn.savannah.nongnu.org/viewvc?view=rev&root=lmi&revision=696
I wonder how I missed it in my initial message in this thread. Oh...I
filtered with "sed -e'/<.*>/d'".

hunspell's '-H' also seems to remove <!-- comments -->, so I'll avoid
that. And hunspell has widely-reported problems with apostrophes, so
I'll filter them. Removing just a few words that I know are okay, I
come to this casual but useful command:

< /opt/lmi/src/lmi/nasd.xsl sed -e'/<[^!].*>/d' \
  | hunspell -L | tr --delete "'" | hunspell | sed -e'/^&/!d' \
  -e'/^& \(MEC\|Sep\|nbsp\|DOCTYPE\|stylesheet\|xsl\|[Cc]hicares\) /d'

& nasd 6 9: ands, sand, NASDAQ, NASA, nasty, nosed
& xA0 3 17: Alexa, Xmas, Xian
& diplayed 7 26: displayed, played, diplomaed, display, dismayed, employed, 
swordplayer
& inital 9 61: initial, in ital, in-ital, genital, Vinita, Ritalin, Italian, 
Intel, entail

It caught both "diplayed" and "inital". The other lines are okay.

>  If you'd like to automate this to ensure that new typos don't get checked
> in, I think aspell is still the best solution.

Yes, that's the goal.

Let's try the command above with 'illustration_reg.xsl'. Manually
filtering the output, it identifies:

& diplayed 7 26: displayed, played, diplomaed, display, dismayed, employed, 
swordplayer
& unaffilliated 7 16: unaffiliated, affiliated, affiliate, unaffectionate, 
unillustrated, overinflated, unflappability
& guaranted 9 18: guaranteed, guarantied, guarantee, guarantor, guaranty, 
quarantined, warranted, granddaddy, granddad

And in 'fo_common.xsl':

& acces 12 5: aces, access, acnes, acres, acmes, aches, accedes, accuses, 
accepts, accents, accent, accede
& differencies 7 51: differences, difference's, difference, differentness, 
differentiates, differential, interference
& paranteses 6 8: parentheses, separateness, separates, guarantees, printers, 
Prentiss
& Simlpy 4 2: Simply, Simplify, Smallpox, Simla
& recursivly 7 35: recursively, recursive, cursively, recursion, recessively, 
reflexivity, aggressively
& adjucent 7 44: adjacent, adjustment, adjutant, adjunct, antecedent, 
adjacency, adjusted
  [I would have suggested "adjuvant".]
& recursivly 7 24: recursively, recursive, cursively, recursion, recessively, 
reflexivity, aggressively
& splitted 11 66: spitted, slitted, splatted, splinted, splitter, splittable, 
splintered, splitting, splattered, exploited, splatter
& appox 5 96: approx, APO, pox, apex, Ampex

...though I imagine those are commentary in "common" macros, which
shouldn't affect the PDF.

This is already immediately useful. Refinements to the command I
cobbled together are welcome.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]