Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a class for manipulating repo data #20

Draft
wants to merge 10 commits into
base: bk-linting
Choose a base branch
from

Conversation

domoscargin
Copy link
Contributor

@domoscargin domoscargin commented Jan 3, 2025

Resolves #13, resolves #12, resolves #10, resolves #9

Example action: https://github.com/alphagov/design-system-github-stats/actions/runs/12678615737

And PR: #23

Changes

Adds neostandard for linting.

  • standardjs appears to be at a bit of a stalemate in terms of governance. neostandard is an open governance fork, which some of the standard team seem willing to fork back into mainline standard once the governance thing is worked out, so we can keep an eye on that.

Adds vitest for testing

  • Mostly because I wanted to try vitest, and it seems nice and speedy and plays nice with ESM.
  • Tests are a bit patchy at the moment, but I've done a few basic ones

Abstracts Octokit

  • Adds an Octokit helper file

Adds a RepoData class

  • Simplifies the build script and makes it easier to unit test things
  • Responsible for storing result data, running checks and gathering the repo data

Updates the script behaviour and return data

  • Now writes to file in batches, rather than each time we analysed a repo
  • Uses the RepoData class extensively
  • Uses a free GraphQL query to fetch initial information so the script speed is faster, and we're not close to getting to our rate limit
  • Now searches for ALL package.json files, in case they're not at root, and/or there's more than one of them
    • For lockfiles, we try root and the same directory as each of the package files
  • Returns a more detailed suite of dependency info
    • Will find and output all govuk-frontend versions which are direct dependencies in directDependencyVersions
    • Will find a single lockfile govuk-frontend version and output in lockfileFrontendVersion
    • Will output a latestFrontendVersion which is the highest of any collected versions.
    • If govuk-frontend is an indirect dependency, stores information about its parents in parentDependency

Dead ends

I did a lot of thinking about how to reduce the script time and API calls. The free GraphQL request for created at, pushed at and latest commit I've implemented is the biggest enhancement, and batch writing to files reduces I/O a lot. But there were some dead ends:

  • I investigated using the dependency graph API, but it's a bit underbaked at the moment, and there's no easy way to get the full dependency tree for dependents. I think GitHub are working on this API though, so there may be methods in the future that allow us to skip most of our script.
  • I also looked at the search API as an easier way to easily find govuk-frontend dependencies in relevant files, but the rate limit is severely restrictive (10 calls per minute for code search). Could potentially be more useable if we're doing caching, but probably not a goer.
  • I tried to get the repo tree during the initial GraphQL query. Getting the root file list is a very light cost, but there's no easy way to recurse through the file structure. Whereas the REST API just takes 1 call to give you the full structure.

@domoscargin domoscargin marked this pull request as ready for review January 3, 2025 11:20
@domoscargin domoscargin force-pushed the bk-repo-data-class branch 11 times, most recently from 4d1d55e to 5c95154 Compare January 5, 2025 21:39
@domoscargin
Copy link
Contributor Author

domoscargin commented Jan 6, 2025

🐛 BUG

Parent dependencies: these are all null in the latest dataset. Suspect this is because we send the "this is an indirect dependency" examples straight to finding a lockfile version, which will generally be successful and find the govuk-frontend dependency at top level, rather than digging in the sub-levels. Probably makes sense to just fold the parent dependency in with this data, to be honest, rather than have a special path just for "indirect" dependencies.

SOLVED: 47fec86

@domoscargin domoscargin force-pushed the bk-repo-data-class branch 5 times, most recently from 0871587 to 7aeda71 Compare January 6, 2025 21:52
@domoscargin domoscargin force-pushed the bk-repo-data-class branch 2 times, most recently from e6009c5 to 11c43e4 Compare January 9, 2025 16:22
@domoscargin domoscargin marked this pull request as draft January 9, 2025 19:48
@domoscargin domoscargin changed the base branch from process-all-data to main January 13, 2025 00:37
@domoscargin domoscargin changed the base branch from main to bk-linting January 13, 2025 00:40
@domoscargin domoscargin force-pushed the bk-repo-data-class branch 11 times, most recently from 52193ed to c668408 Compare January 14, 2025 21:53
@domoscargin domoscargin force-pushed the bk-repo-data-class branch 3 times, most recently from efbb8ae to d75f66c Compare January 15, 2025 21:58
Rather hastily done, and we're not testing the Octokit class just yet, but probably better than nothing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant