Skip to content

Commit

Permalink
Speed up spellchecking by ignoring whitespace-only lines
Browse files Browse the repository at this point in the history
The new API has introduced extra overhead per line being spellchecked.
One way of optimizing out this overhead, is to spellcheck fewer lines.
An obvious choice here, is to optimize out empty and whitespace-only
lines, since they will not have any typos at all (on account of not
having any words).

A side-effect of this change is that we now spellcheck lines with
trailing whitespace stripped. Semantically, this gives the same result
(per "whitespace never has typos"). Performance-wise, it is faster in
theory because the strings are now shorter (since we were calling
`.rstrip()` anyway). In pratice, I am not sure we are going to find
any real corpus where the trailing whitespace is noteworthy from a
performance point of view.

On the performance corpus from codespell-project#3491, this takes out ~0.4s of
runtime brining us down to slightly above the 5.6s that made the
baseline.
  • Loading branch information
nthykier committed May 17, 2024
1 parent 39e2921 commit 96f1b65
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion codespell_lib/_codespell.py
Original file line number Diff line number Diff line change
Expand Up @@ -881,7 +881,8 @@ def parse_file(
)

for i, line in enumerate(lines):
if line.rstrip() in exclude_lines:
line = line.rstrip()
if not line or line in exclude_lines:
continue

extra_words_to_ignore = set()
Expand Down

0 comments on commit 96f1b65

Please sign in to comment.