qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] Add new build target 'check-spelling'


From: Thomas Huth
Subject: Re: [RFC PATCH] Add new build target 'check-spelling'
Date: Mon, 31 Oct 2022 11:50:00 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.13.0

On 31/10/2022 11.44, Stefan Weil wrote:
Am 31.10.22 um 08:52 schrieb Thomas Huth:

On 31/10/2022 08.43, Stefan Weil wrote:
`make check-spelling` can now be used to get a list of spelling errors.
It uses the latest version of codespell, a spell checker implemented in Python.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
---

This RFC can already be used for manual tests, but still reports false
positives, mostly because some variable names are interpreted as words.
These words can either be ignored in the check, or in some cases the code
might be changed to use different variable names.

The check currently only skips a few directories and files, so for example
checked out submodules are also checked.

The rule can be extended to allow user provided ignore and skip lists,
for example by introducing Makefile variables CODESPELL_SKIP=userfile
or CODESPELL_IGNORE=userfile. A limited check could be implemented by
providing a base directory CODESPELL_START=basedirectory, for example
CODESPELL_START=docs.

Regards,
Stefan
[...]
I like the idea, but I think it's unlikely that we can make this work for the whole source tree any time soon. So maybe it makes more sense to start with some few directories first (e.g. docs/ ) and then the maintainers can opt-in by cleaning up their directories first and then by adding their directories to this target here?
 Thomas

Even without implementing CODESPELL_START as described above, the script can already be used and integrated into CI scripts.
It takes about 60 seconds to check the whole source tree including 
submodules on my (slow) virtual machine.
The resulting output has about 20000 lines or 1272 KiB. It can be filtered 
for relevant parts of the source tree or used for a summary.
Sample script: grep "^[.]" spellcheck.log | sed s/^..// | sed 's/\/.*//' | 
sed s/:.*// | sort | uniq -c
This produces a summary for the top level hierarchy of files and directories:

       3 accel
       1 audio
       1 backends
      77 block
       7 block.c
      20 bsd-user
     386 capstone
      12 chardev
       1 configure
       8 contrib
       6 crypto
      64 disas
      32 docs
      31 dtc
       8 fpu
       1 gdbstub
       1 gdb-xml
       1 .github
     537 hw
       7 inc
     114 include
       1 libdecnumber
      33 linux-user
       1 MAINTAINERS
     150 meson
       6 meson.build
      16 migration
       1 nbd
       5 net
      12 pc-bios
       7 python
       3 qapi
       2 qemu
       5 qemu-options.hx
      22 qga
   14175 roms
      43 scripts
       3 semihosting
      18 slirp
       2 softmmu
      59 subprojects
     504 target
       6 tcg
       3 test.rb
     175 tests
       6 tools
      20 ui
       8 util

It shows that "roms" contributes by far the most typos. Omitting it would reduce the required time to 22 seconds and the number of typos found (2947 lines in output) very much.
"roms" mostly consists of third-party submodules that we do not have direct 
control of. I think this should definitely be omitted.
"capstone" (which has no entry in MAINTAINERS)
That's likely because it has been a submodule that has been removed a while 
ago. "rm -rf capstone" should solve that issue on your local buildtree ;-)
(yes, that's another nuisance of submodules - the checked out files don't go 
away when the submodule gets removed)
 Thomas




reply via email to

[Prev in Thread] Current Thread [Next in Thread]