CredRank is a new variant of the SourceCred algorithm that takes a different approach to how we handle time. In early tests, CredRank:
- improves cred analysis runtime by about 20×;
- decreases output size by about 50×;
- improves the interpretability of the algorithm, answering questions like “how much cred did this person earn this week?” with sound additive semantics; and
- improves the flexibility of the algorithm, enabling features like bounded-time employment cred, node-dependent α-values, and granular time slicing.
Furthermore, our implementation sketch for CredRank introduces a data type for “Markov process graphs” with unidirectional edges weighted by pre-normalized transition probabilities. This simplifies the graph transformation pipeline by serving as a stepping stone from the plugin graph interface to raw Markov chains.
Conceptually, the main change is to transform edges incident to scoring nodes (users) to instead pass through an family of “epoch nodes” indexed by time period. We call this process fibration (by analogy with fiber bundles). The score of an epoch node for user u in time period t represents how much cred a user earned for contributions made in that time period. Epoch nodes send a portion of their cred to their owning user, and the rest along the out-edges that were previously incident to the owner. Adjacent epoch nodes have “webbing” edges forward and backward in time, so that a contribution’s cred is slightly smeared to mitigate spoiler effects and reduce variance. We also reify the teleportation vector as a node in the graph (the “seed node”) with in-edges from most nodes (but not epoch nodes) and out-edges in proportion to node weights. Thus, a Markov process graph can be converted to a standard Markov chain whose stationary distribution can be computed without any α-parameterization at the Markov process level.
While this is more an evolution of the old algorithm than a reinvention from scratch, we’ve adopted the name CredRank to clarify that the algorithm is meaningfully different from its “classic PageRank” cousin.
CredRank enables us to run cred analysis and grain distribution with a much tighter feedback cycle by dramatically improving the efficiency of cred computation with respect to time. Rather than distributing grain on a weekly basis, we could distribute it daily or hourly.
Specifically: The amount of work spent computing stationary distributions (which is expensive) used to be linear in |V| · |T|, where |V| is the number of nodes in the graph and |T| is the number of time periods. It’s now linear in |V| + |V_scoring| · |T|, where |V_scoring| is the number of scoring nodes in the graph, which is much smaller than |V|.
CredRank enables “employment cred”, wherein a company may agree to pay a contributor to work on a project in exchange for a share of the contributor’s cred in the project earned during the period of employment. It’s still the case that cred should not be “for sale” on a market; an employment cred relationship represents an agreement at the time that the cred was minted. CredRank provides a natural way to do this safely, by simply creating multiple out-edges from epoch nodes.
Several long-time SourceCred contributors have been sponsored to work on SourceCred and would like that sponsorship to be fairly reflected.
The implementation of CredRank enables cleaning up a lot of legacy core APIs, implementations, and UIs now that we’ve found better abstractions, improving accessibility both for interested new users and for maintainers.
- Factor out any pre-work from the
wchargin-markov-process-graphsbranch (e.g., #1676).
- Productionize and write tests for the
MarkovProcessChaindata type, plus constructors (from plugin graphs) and destructors (to raw Markov chains).
- Connect the new pipeline to a command-line interface (
- Connect the new pipeline to a user-facing web UI, either adapting the legacy cred view, adapting the timeline cred view, or creating something new.
- Test against a variety of projects for efficiency and qualitative results. It’s okay and expected that distributions should change. The new distributions should be “at least as reasonable” as the old ones.
- An implementation of
MarkovProcessGraphthat’s up to SourceCred quality standards.
- A CredRank view integrated into the live web UI.
- Lessons from the initial cred algorithm and the timeline cred algorithm were important in identifying problems and shared vocabulary.
- The “
cli2” scaffolding from the new instance system was a big help in providing a low-friction integration point for the new types and analysis.