Suppose that the SourceCred team goes to a hackathon, and you help me contribute there. Suppose that you arrange my flights, so that I can actually make it to the hackathon. Clearly, you deserve some cred.
How can we actually represent that within the SourceCred graph?
To start, we should make nodes in the graph that represent your contributions: “arranging Dandelion’s hackathon logistics”. But how do we connect those nodes, so that you receive an appropriate amount of cred?
I imagine that the cred graph already has a node that represents me, a node that represents the hackathon, and a node that represents “logistics” as a general value we share. How do we connect them?
- We could connect your contribution directly to me, but that means that you get a share of my cred, regardless of whether it’s for work at the hackathon or elsewhere.
- We could connect your contribution directly to the hackathon, but that means you get a share of everyone’s cred, regardless of whether my personal contributions were actually important.
- We could connect your contribution directly to the logistics value, but that means you get a share of all logistics work, even if both me and the hackathon were unimportant.
Clearly, we need a more expressive way to connect these nodes.
This is a particular instance of a very general problem. We want to be able to represent contextual cred. For example, if you help teach me Git you earn cred in the Git-related work that I do; if I come up with the architecture for the GitHub plugin I earn “architecture-cred” in the context of the GitHub plugin. A particularly important type of “context” is the temporal context: we want to be able to ask, “who earned cred in the past week?”
We could solve this by re-organizing the graph to privilege the question we’re trying to ask. For example, rather than creating a single identity node for me across the project, we could create a “scoped identity” node that represents “me-at-the-hackathon”. Then, all of my contributions at the hackathon can be directly connected to “me-at-the-hackathon”, and “me-at-the-hackathon” can be connected back to me. Now, we could connect your contribution to “me-at-the-hackathon”, and the cred works out. (If we want to be able to represent that your logistics work supported our overall logistics value at the hackathon, we could also create a “logistics-at-the-hackathon” node.)
This approach works, but it both adds a lot of complexity (every other plugin needs to know about the “me-at-the-hackathon” node), and privileges one particular context in the graph. What if instead of helping me attend the hackathon, you helped teach me linear algebra, thus enabling me to be more effective in all lingear-algebra-related domains? The answer can’t be to create a “me-doing-linear-algebra” node, as this implies we’re going to explode the graph with the cross product of every person and every possible context.
We could imagine solving this by adding more metadata to the graph. Maybe every edge can have a list of “contexts” as metadata, where in this case the “context” for the edge connecting your contribution to me consists of the “logistics” and “hackathon” nodes. In the most general (and powerful) case where contexts may be arbitrary nodes in the graph, we’ve essentially upgraded from a graph to a hypergraph. This would give us a lot of power… but it would also make things a lot more complicated.
We could also accomodate contexts by encoding the same information directly into the graph, and then changing the cred algorithm to accommodate this new information.
For example, what if we just connected your contribution directly to all three nodes: me, the hackathon, and to logistics? Under the semantics of the current algorithm, this would give your contribution cred from all three nodes. That doesn’t seem quite right to me, but it would give us a way to answer questions like: “how much logistics cred do you have in the context of the hackathon” (or, equivalently, “how much hackathon cred do you have in the context of logistics”).
We could do this by getting a “logistics-cred” via seeded PageRank starting at the logistics node, and then “hackathon-cred” via seeded PageRank starting at the hackathon node, and then multiply the scores and re-normalize. (Or maybe you could just run seeded PageRank starting at both the hackathon and logistics nodes?)
I really don’t know what the right approach is. But I think developing this capability will be enormously important to the project as a whole, so I hope that you all will have some ideas.
Thanks to @mzargham as conversations with him substantially informed these ideas. Also, this post exists in the context of the Odyssey hackathon.