SourceCred stack lookup
So I’ve started an experiment over at https://github.com/teamopen-dev/sourcecred-stack-lookup (With a lot of feedback from @nothingismagick )
The basic idea is to calculate cred scores for GitHub repositories ahead of time and host the results, so you can do a fast lookup of information about without having to mirror the repository or calculate cred scores yourself.
This approach will save you between a few minutes to a few hours per repository. Allowing you to do aggregate analysis, at the cost of being able to tweak parameters like the weights used.
Use-case: low bus-factor risk for javascript projects
The version right now uses a reasonably simple interpretation of SourceCred scores to find out which of your dependencies might have a bus-factor risk.
It looks at a few things:
- Is most of the work done by a few people?
- Do the same people show up in different projects as top contributors?
- Did a lot of work go into the project?
It categorizes these factors into: Low, Medium, High and CRITICAL impact.
Example: Scanning sourcecred/sourcecred
sourcecred/sourcecred$ yarn -s lookup
Collecting data for sourcecred/sourcecred/package.json
CRITICAL impact contributors at risk from bus-factor found:
- @JoshuaWise, in projects: [ 'joshuawise/better-sqlite3' ]
- @kkaefer, in projects: [ 'joshuawise/better-sqlite3' ]
- @jgm, in projects: [ 'jgm/commonmark.js' ]
- @mbostock, in projects: [
'd3/d3-scale',
'd3/d3-array',
'd3/d3-format',
'd3/d3-time-format',
'd3/d3-time',
'd3/d3-scale-chromatic'
]
- @raszi, in projects: [ 'raszi/node-tmp' ]
- @silkentrance, in projects: [ 'raszi/node-tmp' ]
- @jessebeach, in projects: [ 'evcohen/eslint-plugin-jsx-a11y' ]
- @ljharb, in projects: [ 'evcohen/eslint-plugin-jsx-a11y', 'chrisdickinson/raf' ]
- @evcohen, in projects: [ 'evcohen/eslint-plugin-jsx-a11y' ]
- @coveralls, in projects: [ 'evcohen/eslint-plugin-jsx-a11y' ]
HIGH impact contributors at risk from bus-factor found:
- @springmeyer, in projects: [ 'joshuawise/better-sqlite3' ]
- @Mithgol, in projects: [ 'joshuawise/better-sqlite3' ]
- @SGrondin, in projects: [ 'sgrondin/bottleneck' ]
- @tmpfs, in projects: [ 'jgm/commonmark.js' ]
We encourage you to make sure these contributors receive enough support.
Current NPM implementation
An NPM package is usually open source and likely to have a GitHub link in it’s package.json
.
Meaning we should be able to crawl it pretty easily. So using this for an aggregate use-case made sense.
Generating scores
There is a cronjob running on my server, which will gradually load a queue of projects and generate score files for them. The sourcecred data folder is kept on the server as cache, but the score files are committed to GitHub pages. https://github.com/teamopen-dev/sourcecred-stack-lookup/tree/gh-pages Also included is a meta file, which lists the available scores and their last updated timestamps.
https://scsl.teamopen.dev/v0/meta.json
The client
There’s also a client, uploaded as a package on NPM: https://www.npmjs.com/package/@teamopen/sourcecred-stack-lookup
It has logic to do the resolving from NPM package names to GitHub repos, using the meta.json file. As well as finding out which scores are currently available. Then downloads the ones available.
(This client works both as a devDependency for node, as well as in Notebooks, like this https://observablehq.com/@beanow/do-things-with-scsl)