Greetings, credizens! I am very pleased to share a rough draft of SourceCred’s own cred, for the CredSperiment.
In contrast to the instances I shared in the CredSperiment Progress Report, we now have a combined instance which properly resolves identities across GitHub and Discourse.
You can check out the instance here. For posterity: If the url is down, you can also start any http server in the docs subdirectory of this commit.
My analysis of the scores
Here are the scores from the new instance:
On the whole, I think the scores are pretty reasonable; at least as an ordering, they correspond reasonably well to my intuitions about who has been contributing to the project. (Although some very important contributors, like Juan Benet, are missing entirely, because they contributed offline.)
Weight Tweaks
It’s not entirely a happy accident that the scores reflect my intuition… I tweaked the weights!
Mostly, the weight tweaking consisted of pushing down the GitHub weights and increasing the Discourse weights, as I felt that Discourse contributions were undervalued with the default weights. You can click “show weight configuration” in the top-right to see the weights I used, and you can change them and recompute if you want. Feel free to suggest different weights on this thread!
(If you want to see which weights I changed from their default values, you can take a look at the weights.json file.)
Scores for different contribution types
A deeper way to explore how SourceCred is doing is to look at the scores not just for the users, but for the contributions themselves. Ideally, when looking at the top contributions within a category, you’ll immediately say “yeah, that stuff was all really important!”. If SourceCred is struggling, it will seem more like a random selection.
GitHub issues
Starting with the positive, I think it actually does a pretty good job of identifying important issues on GitHub:
Those top issues all correspond to really important features in SourceCred. The scores are working well because I tend to create a “tracking issue” for a broad area of work, and then reference that tracking issue from each pull request that works on it. SourceCred detects the reference edges, and thus flows cred to important tracking issues.
Discourse Topics
However, it struggles a bit more with identifying important Discourse posts:
Many of these top topics (including the top 3) are by users who have very little engagement other than posting a topic; they have very few “out-edges”, and as such their cred gets stuck in a self-referential loop. I discuss this in more depth in Sneak peek: SourceCred Discourse Plugin. I also have a plan on how to quantitatively detect when this is happening, and a fix for the bug; more to come later.
That said, the Discourse plugin is still missing a vital feature: reference detection. The important posts and topics around here tend to get referenced a lot; we don’t track that yet, but it’s pretty easy to add. I expect this will improve the cred quality a lot.
GitHub Pull Requests
SourceCred does even worse with identifying important pull requests:
Honestly, this might be no better than random. The issue is that for most pull requests, SourceCred has very little information about how much it mattered; we’re not yet looking at which files it touched, or module dependencies, or even the number of lines of code changed! There are likes, true, but many important pull requests go unnoticed and unliked.
We’ll need to get better at this. Personally, I would really like to create a way for trusted contributors to directly opine on how valuable different pulls are. That could be through the “boost” mechanic, or as an even faster fix, @wchargin and I could assign manual weights to every pull request.
Scores by Domain
As an added dimension of analysis, we can take a look at the GitHub and Discourse sides of cred separately.
GitHub
In my opinion, the GitHub cred shows clear limitations of assigning cred based on activity levels: after William stopped focusing on SourceCred full time last November, the total level of activity on GitHub dropped dramatically, but the rate of value creation did not drop nearly that fast. However, since I was doing most feature work on my own rather than with William, there were fewer comments, fewer reviews, etc, which means less cred in the current system.
On the Discourse side, it looks pretty reasonable overall:
Though it stands out to me that @nayafia has one of the highest scores, despite only having two posts. Like I mentioned above, this is because PageRank goes a little crazy for people that have essentially no out-edges, and I have a plan on how to fix this.
Takeaways
As we’ve explored, these scores are imperfect. But: I think they’re also the best scores that SourceCred has yet produced! Integrating the Discourse plugin lets us recognize a host of really important contributors who were going un-seen when we only used GitHub data.
The key question now is: are these scores good enough for us to inaugurate Phase 1 of the CredSperiment, and start paying based on the topline scores? In my opinion, the answer is “yes”, so I’m planning to go ahead with calculating payouts in time for the first week of October.
Please post your thoughts, your concerns, and your alternative weight configurations!