SourceCred

A Gentle Introduction to Cred

SourceCred, as the name might suggest, is all about attributing cred. Cred is a metric that describes every contribution and contributor in a project, giving a sense of how important they were.

For example, forum posts (like this one) can earn cred. If this post earns 5 cred, and another post earns 10 cred, then that other post is considered twice as important as this one. (Note: SourceCred doesn’t load posts from this forum yet.)

The contributions are arranged in a graph, where contributions are nodes, and have edges indicating how they relate to other contributions. For example, if someone writes a reply to this post, there will be an edge connecting the reply and this post. Edges have “types” which give some information about what kind of edge they are; that edge might have an “IS_REPLY” type.

Contributors—like you or me—are also nodes in the graph, and are connected to the contributions that they create. So there is an “AUTHORS” edge connecting me (the author) to this post.

You can think of each edge in the graph as being a “thank you”. Thus, this post thanks me for writing it. The “thank yous” can be bidirectional; likely your reply thanks this post for being written, and the post thanks your reply for participating with it.

SourceCred converts this graph into a numerical score via PageRank. Basically, we assign cred to the nodes so that every node receives cred from every node that thanks it, and in turn sends its cred to every node that it thanks. This means that cred accumulates at important nodes. For example, a core maintainer is ‘thanked’ by all of the posts, comments, and issues that they’ve written, so they have a lot of cred. On the other hand, a spam post on the forum may have been thanked by no-one, so it will have very little cred. (PageRank is a very interesting algorithm, and was actually the basis of Google search! If you want to learn more, I recommend the original PageRank paper.)

One important thing to remember is that amount of cred a node receives is the same as the amount that it sends to other nodes. This means that being thanked by a high-cred node is much more valuable than being thanked by a low-cred node, especially if that high-cred node didn’t thank many other nodes.

We can explore all of this in the SourceCred explorer. To get started with the explorer, load a repository and then run PageRank. It will then show all of the user nodes in the graph, sorted by their cred.

In the screenshot below, we can see a list of all the users that contributed to SourceCred, sorted by their cred.

Remember that the graph actually contains every contributor and contribution to SourceCred. The explorer defaults to showing just the user nodes, but we can use the filter select to find every node instead.

If we expand a single node, we can see how that node received its cred via its connections to other nodes. At the top level, it aggregates groups of connections based on the type of edge, and the type of node the edge connects to. The percentages show what fraction of the node’s cred came from that connection, and the numbers show how much total cred came from that connection.

Then, diving down within a particular group of connections, we can see all of the individual edges along with how much cred they contributed.

If we want to learn more about a particular edge, we can expand it to see the node that edge connects to. This gives us the ability to dive into the graph from a fresh starting point. As you go “deeper” in your exploration of the graph, the color becomes deeper as well.

Nodes and edges have weights, which make them more or less important. The effect of edge weights is straightforward: when cred is flowing out of a node to its neighbors, the amount that flows to each neighbor is proportional to its edge weight. Node weights take effect by modifying the edge weights: edges pointing to a high weight node get more weight. (We might change how this works in the future.)

Right now, weights can only be configured at the type level. You can open the weight configuration in the cred explorer by clicking the button labeled “Show weight configuration”. Using the weight config, you can make certain node or edge types more important than others. For example, maybe you think authors edges should be more important than references edges, or pull request nodes should be more important than comment nodes. You can express that using the weight config.

The weight config also lets you set “directionality” of edges. Remember, every edge technically points in both directions. The directionality lets you make it point more in one direction than the other. If the directionality is 0.5, then cred flows forward and backwards in equal amounts. If the directionality is higher, say 0.9, then 90% of the cred will flow forward and only 10% will flow back. The edges are always named as a verb phrases, so that the edge point from subject to object, e.g. “authors” edges point from author to content, and “has parent” edges point from child to parent.

In the future, we plan to add a more powerful weight configuration system called heuristics. Heuristics will provide a way of evaluating different nodes within the same type: for example, you could add a heuristic that pull requests that touch many lines of code are more important, or that forum posts that are just a link are not very important. Heuristics will be pluggable, so that projects can define their own heuristics.

One nice thing about this system is that it’s very flexible and general-purpose. The SourceCred core system creates algorithms for attributing cred, and tools for exploring and moderating cred distributions. All of the actual nodes and edges come from plugins. SourceCred is focused on creating cred for open-source, so we’ll be putting a lot of attention into the GitHub plugin, the Git plugin, and other source-code related plugins. However, SourceCred could be used for many different applications. For example, in a music-oriented community, songs and samples could be nodes; songs could thank the samples they use, remixes could thank the originals, etc. I think that academic papers and citations are another natural domain for cred.

Let me know with a reply if anything here is unclear, or if you have improvements to suggest. You’ll earn some cred in the process :wink:

7 Likes

Thanks, this is an excellent explainer article. I’m particularly fond of the future Heuristics upgrade; bias of the crowd is a huge problem, especially in academic funding whereby popular “click-bait” topics extract funding for purposes that are scientifically unsound and flavour of the month… e.g. Al Gore and his dodgy correlative - rather than causative - metrics…

My question is how does self-population of the graph work? How do we add contributors? (assuming that contributions that are non important will be ignored as outlined above)

1 Like

SourceCred is built around plugins for capturing different domains of data. So each plugin can have its own method for populating the graph, and then at the project level we merge graphs across plugins.

To be concrete: Here’s a list of current and planned plugins, and how they populate the graph.

  • Git Plugin

Currently exists, populates the graph by cloning a git repository and then creating nodes for every commit, etc.

  • GitHub plugin

Currently exists, populates the graph by downloading the history of a repository from GitHub, then creates nodes for every issue, pull request, etc, along with nodes for users’ GitHub accounts.

  • Discourse plugin

Planned (but not yet live); will populate the graph by creating a node for every post and reply here on this forum, along with nodes for users’ Discourse accounts

  • Odyssey plugin

Under active development, acts as a “manual mode” that allows users to directly add nodes and edges to the graph. Can be a catch-all for recording work that gets missed by the other plugins, as well as providing explicit guidance on what the project’s priorities and values are.

So, as you can see, SourceCred is really flexible, and can basically get contributions from anywhere they are happening (or they can get manually added via the Odyssey plugin). For example, if we had a community producing a lot of Medium content, we could write a Medium plugin that creates nodes for every post, along with edges showing who authored them, who helped review them, how they reference other things, etc.

3 Likes

Is this still the latest high level overview of the SourceCred protocol?

Also, is the plan to have it be an arbitrary program that can be run on a server and applied to any community/project, or is the idea to run it as a set of contracts on a blockchain, or to actually roll your own blockchain that processes cred and allows people to build apps on top that connect to their communities?

This document is still mostly accurate, but mildly outdated in some particulars:

  • SourceCred now has “timeline cred”, which shows how much cred each individual earned every week, rather than a single lifetime score. Much better for doing things like ongoing payments (of cash or tokens). You can read a bit more about it in the Timeline Cred announcement post and in this thread about attacks / cred farming.
  • Timeline cred has a new UI, which doesn’t (yet) support recursively exploring where a node’s cred came from. However, there is still a ‘legacy’ mode that shows the UI depicted
  • Node weights can be set manually for individual nodes, not just on type
  • The way that node weights work has changed, so they now affect how much cred “resets” to the node with high weights, rather than tweaking how cred flows from adjacent nodes.

The SourceCred implementation is a JS-based program that anyone can run. In principle, you could write an implementation that runs on a blockchain; however, this would have two problems. First, calculating cred is both compute and data intensive, and I don’t know of any blockchains that scale well enough to support calculating cred on-chain. Second, much of the data “lives” off chain (e.g. GitHub history), so at the moment there is no way to avoid having an oracle.

However, SourceCred is decentralized in that anyone can run an instance on a project, and using the same config, they should get the same results. So a project could appoint a set of semi-trusted validators to run cred off-chain and then report the results on-chain, and it would be easy for everyone in the community to check that the work is being done faithfully.

Let me know if that doesn’t answer your question.

3 Likes

Awesome! Sounds like it would work really well for a PoA sidechain since those already rely on a trusted set of validators. Could also create a staking/chalenge game with rewards/penalties, or use something like the Aragon Court

Thanks for the intro!

2 things come to mind: 1/ Once a couple of projects have successfully leveraged SourceCred then it’ll be very tempting for any similar project to jump into the SourceCred train. Is it possible to see such project atm? 2/ I’m guessing that a lot of projects will be lost at first so they probably will want to setup SourceCred & adjust parameters as they go. Are you going to work on providing some good practice for SourceCred parametrization ?

As far as I know, sfosc is the only other project using cred scores so far.

Yeah, coming up with good patterns for choosing the parameters is going to be extremely important as SourceCred gets adoption. For example, so far we’ve discussed how to get the community involved in choosing the parameters:

But really, the first thing to do is kick off the CredSperiment which will give us “real” experience with SourceCred and parameter choices. We won’t be able to come up with best practices for others until we’ve found working solutions for ourselves.

2 Likes

Would love to see a spec for a network of incentived oracles that validate each-other and report back to the chain their results (when requested). This way an on-chain entity (DAO) could:

  1. Pay for GitHub sourcecred indexing/computing by the network.
  2. Ask the network to oraclize specific metrics for a specific time period. For example - give me a merkle root that represents each contributor’s % of cred out of the whole.
  3. Use the metrics that the network pushes & validates on-chain to then do other on-chain things, like distribute and claim funds.

The use case described above comes from wanting DAOs to be able to easily allocate & distribute on-chain assets for off-chain activity, without having to run their own trusted servers.

Note: A robust solution for linking off-chain accounts w/ a public key is needed.