-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pre-commit pylint checks are doing too much work #121
Comments
@jepler I'm not entirely sure I follow. We're using |
To better show what I mean, suppose you change the pylint to a "teapot check", which
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index cce4c7b..1b8ebe4 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -38,5 +38,5 @@ repos: pylint (tests code)
description: Run pylint rules on "tests/*.py" files
entry: /usr/bin/env bash -c
- args: ['([[ ! -d "tests" ]] || for test in $(find . -path "./tests/*.py"); do pylint --disable=missing-docstring $test; done)']
+ args: ['echo "I am a teapot"; echo "files:"; echo "$@"; exit 1', '-']
language: system
Then run the pre-commit checks:
What is going on here is, pre-commit has noticed that my system has multiple cores, and it believes (because we have not told it otherwise) that When
|
Since the last round of pre-commit changes, we've discovered a few things about how better to use pylint from pre-commit. However, as the impact of this is actually rather modest (just some extra CPU time spent), it's fine to leave until we have another adabot patch we wanted to do on all the repositories, especially since I feel like my own ideas about this are still a bit in flux as I learn more. pre-commit ParallelismBy default, pre-commit parallelizes each check that it runs. It makes a list of files, divides it into a number of groups of files, and then invokes the command multiple times, possibly in parallel. This has several consequences that we care about: Parallelism & the main packageIf the main package consists of enough files, the "pylint (library code)" step will invoke the pylint command itself multiple times, each with a subset of the files. This can inhibit duplicate code checking, or cause possibly even cause duplicate code checking to give different results depending on the number of cores a developer's system has. Resolution: Add This worked as expected for a single-file library (adafruit_datetime) and a package (jepler_udecimal) even when a total of 105 files were within the package. Parallelism, tests, & examplesA different problem existed with the pylinting of tests and examples. Because these had a command ( pylint duplicate-code checkingPylint always had a duplicate code check, but due to bugs it was historically ineffective. Starting perhaps with pylint 2.7.0, the check became effective again. We don't want to apply the duplicate code check to tests and examples, so we used a workaround where we invoked pylint separately for each individual file. Several ways to disable the check within pylint were investigated, but initially the one effective way was not found. Not working:
Working:
Once we have an effective way to disable duplicate-code checking on just the files we want, we can change from using a "local hook" to using a "pylint hook". pre-commit includes adquate filters ("types", "exclude" and "files") to let us run three different versions of the pylint hook on three different subsets of files It's worth noting that it's safe to have How it's working outI initially noticed this in adafruit_datetime, where it felt like running the Before this change, Then another wrinkle occurred…Duplicate code checking and the main pylint step still aren't right. Well, they're right when you run The only alternative I'm aware of would be to return to the "local hook", but this time for the library/package itself, to ensure that pylint is run on all files all the time. Using "exclude + types" directives could still allow this step to run only when something in library/package is modified. PatchHere's what I'm testing locally. However, it doesn't incorporate any fix for the last item. diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index cce4c7b..a8b58f9 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -24,19 +24,14 @@ repos:
name: pylint (library code)
types: [python]
exclude: "^(docs/|tests/|examples/|setup.py$)"
-- repo: local
- hooks:
- - id: pylint_examples
- name: pylint (examples code)
- description: Run pylint rules on "examples/*.py" files
- entry: /usr/bin/env bash -c
- args: ['([[ ! -d "examples" ]] || for example in $(find . -path "./examples/*.py"); do pylint --disable=missing-docstring,invalid-name $example; done)']
- language: system
-- repo: local
- hooks:
- - id: pylint_tests
- name: pylint (tests code)
- description: Run pylint rules on "tests/*.py" files
- entry: /usr/bin/env bash -c
- args: ['([[ ! -d "tests" ]] || for test in $(find . -path "./tests/*.py"); do pylint --disable=missing-docstring $test; done)']
- language: system
+ require_serial: true # so that duplicate code checking is the same on all systems
+ - id: pylint
+ name: pylint (example code)
+ types: [python]
+ files: "^examples/"
+ args: ["--disable=missing-docstring,invalid-name,duplicate-code"]
+ - id: pylint
+ name: pylint (test code)
+ types: [python]
+ files: "^tests/"
+ args: ["--disable=missing-docstring,duplicate-code"] |
Any progress on this? I'm getting pre-commit errors on an example when only changing README.rst. I'll follow up to fix the example lint in a subsequent commit. We should probably switch to using an external repo for pre-commit checks: https://pre-commit.com/#plugins That way we'd be able to have the code to execute it in one spot instead of many. |
I have a possible improvement for the examples and tests pylint hooks. diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 354c761..6da32d6 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -27,8 +27,9 @@ repos:
- repo: local
hooks:
- id: pylint_examples
+ require_serial: true
name: pylint (examples code)
description: Run pylint rules on "examples/*.py" files
entry: /usr/bin/env bash -c
- args: ['([[ ! -d "examples" ]] || for example in $(find . -path "./examples/*.py"); do pylint --disable=missing-docstring,invalid-name $example; done)']
+ args: ['([[ ! -d "examples" ]] || pids=();for ex in $(find . -path "./examples/*.py");do pylint --disable=missing-docstring,invalid-name $ex & pids+=($!); done; exval=0; for pid in ${pids[*]}; do wait $pid; exval=$(($exval+$?)); done; exit $exval)']
language: system The bash command broken out: pids=();
for ex in $(find . -path "./examples/*.py"); do
# Run in parallel and collect the pids
pylint --disable=missing-docstring,invalid-name $ex & pids+=($!);
done;
exval=0;
for pid in ${pids[*]}; do
# Wait for the pids to finish and collect the exit values
wait $pid;
exval=$(($exval+$?));
done;
# Return a combined exit value
exit $exval Reference for the bash idea: https://unix.stackexchange.com/a/595838 This was tested on a Raspberry Pi 4 (4 cores) on a library with 10 files in the examples directory with the following results: My Bash scripting skills are negligible and this script is beyond the skill level I feel I have. Assuming this is repeatable, and works on Github (I've only tested locally) hopefully this can be the basis for speeding up workflows. |
Removing myself as an assignee because I cannot resolve this myself, and should not have been assigned in the first place. |
I noticed that in adafruit_datetime, the pre-commit check could take a long time, especially for the "tests" step. Furthermore, all 4 of my CPU cores were active.
I believe this is because by default, pre-commit
Since pylint needs to get a view of all the source files it's checking in order to do proper code duplication checks, we make our own list of files to pylint with
find
and ignore the positional arguments that are given. But unless we also specifypass_filenames: false
pre-commit doesn't know about it and starts invoking the "pylint all files" command once for each file!This change is one I'm testing locally in adafruit_datetime:
If we want to make a change like this we'll have to apply it with adabot to existing repos as well.
The text was updated successfully, but these errors were encountered: