First off, there’s a ton of context here which I’m happy to discuss on a call.
Regarding understanding the flow of data processing within SourceCred, it works like this:
- Every plugin generates a SourceCred graph that is specific to its domain (e.g. GitHub createGraph, discourse createGraph)
- The graphs are merged together into one canoncial combined graph (including identity resolution)
- We then compute timeline cred on that graph using the TimelineCred class
The weightEvaluator is a small piece of logic within that broader orchestration. In general, I would say the api/load module is best place to look to understand the overall flow for generating cred.
The sourcecred/cred repository is the prototype cred instance or “cred repository” where we store the results for SourceCred itself. However, it doesn’t contain the logic for computing cred, except insofar as it has a small update script which invokes SourceCred command line tools.
In general, almost all of the processing happens when creating the graph, and then applying the algorithm to the graph. This process gives us timeline cred, for every week, every single node in the graph that existed as of that week has an associated cred score.
Overall Advice
Overall, I think the “help wanted” label on this initiative is mostly premature (I didn’t actually write that bit). Currently, it’s possible to access the cred data in a notebook, but it isn’t possible to inspect or modify the process that generates cred. In the future, it will be possible to load the graph for a given project into a notebook, but then drive the cred calculation process in the notebook, potentially changing the weights, parameters, or even tweaking how the cred algorithm works.
This is not possible yet because we can’t import SourceCred core modules into an Observable Notebook. To do this, we first need to Publish SourceCred on NPM. We will need an experienced JS developer to champion that before we can unlock the true potential of cred analysis notebooks.
However, there is still some interesting notebook work to be done; basically anything that we can do that depends on the cred scores but not on changing or re-computing the cred scores. One notebook that I started and haven’t finished is this participation power exploration. The idea is to formally explore participation power as a mathematical construct, looking at how the parameter choice and construction will affect things like the gini coefficient of power within SourceCred.
If you wanted to take on a notebook, that one could be a good place to start! You’ll definitely need to keep demonstrating independent sleuthing skills, and we could also have a call as a context dump where I share the “what, why, how, and alternatives considered” of that notebook.
Also, I recommend watching my old SourceCred code walkthrough as a way to get context on the SourceCred codebase. A number of the specific details have changed; however, the overall tools and organizational patterns have not.