|
From: | Thomas Huth |
Subject: | Re: [RFC PATCH] Add new build target 'check-spelling' |
Date: | Mon, 31 Oct 2022 11:50:00 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.13.0 |
On 31/10/2022 11.44, Stefan Weil wrote:
Am 31.10.22 um 08:52 schrieb Thomas Huth:On 31/10/2022 08.43, Stefan Weil wrote:`make check-spelling` can now be used to get a list of spelling errors.It uses the latest version of codespell, a spell checker implemented in Python.Signed-off-by: Stefan Weil <sw@weilnetz.de> --- This RFC can already be used for manual tests, but still reports false positives, mostly because some variable names are interpreted as words. These words can either be ignored in the check, or in some cases the code might be changed to use different variable names. The check currently only skips a few directories and files, so for example checked out submodules are also checked. The rule can be extended to allow user provided ignore and skip lists, for example by introducing Makefile variables CODESPELL_SKIP=userfile or CODESPELL_IGNORE=userfile. A limited check could be implemented by providing a base directory CODESPELL_START=basedirectory, for example CODESPELL_START=docs. Regards, Stefan[...]I like the idea, but I think it's unlikely that we can make this work for the whole source tree any time soon. So maybe it makes more sense to start with some few directories first (e.g. docs/ ) and then the maintainers can opt-in by cleaning up their directories first and then by adding their directories to this target here?ThomasEven without implementing CODESPELL_START as described above, the script can already be used and integrated into CI scripts.It takes about 60 seconds to check the whole source tree including submodules on my (slow) virtual machine.The resulting output has about 20000 lines or 1272 KiB. It can be filtered for relevant parts of the source tree or used for a summary.Sample script: grep "^[.]" spellcheck.log | sed s/^..// | sed 's/\/.*//' | sed s/:.*// | sort | uniq -cThis produces a summary for the top level hierarchy of files and directories: 3 accel 1 audio 1 backends 77 block 7 block.c 20 bsd-user 386 capstone 12 chardev 1 configure 8 contrib 6 crypto 64 disas 32 docs 31 dtc 8 fpu 1 gdbstub 1 gdb-xml 1 .github 537 hw 7 inc 114 include 1 libdecnumber 33 linux-user 1 MAINTAINERS 150 meson 6 meson.build 16 migration 1 nbd 5 net 12 pc-bios 7 python 3 qapi 2 qemu 5 qemu-options.hx 22 qga 14175 roms 43 scripts 3 semihosting 18 slirp 2 softmmu 59 subprojects 504 target 6 tcg 3 test.rb 175 tests 6 tools 20 ui 8 utilIt shows that "roms" contributes by far the most typos. Omitting it would reduce the required time to 22 seconds and the number of typos found (2947 lines in output) very much.
"roms" mostly consists of third-party submodules that we do not have direct control of. I think this should definitely be omitted.
"capstone" (which has no entry in MAINTAINERS)
That's likely because it has been a submodule that has been removed a while ago. "rm -rf capstone" should solve that issue on your local buildtree ;-)
(yes, that's another nuisance of submodules - the checked out files don't go away when the submodule gets removed)
Thomas
[Prev in Thread] | Current Thread | [Next in Thread] |