SourceCred

Cred, Cost, and 'Resistance'

Currently, we have a problem in SourceCred: we measure the value of activity, but not the associated cost.

This means that SourceCred will always reward doing more–more pull requests, more comments, more notifications, more activity, even if the cost of that activity exceeds its value.

Here’s an example that @wchargin and I have discussed. Consider two pull requests suggesting equally valuable changes to the codebase. One pull request has been carefully prepared, and merges smoothly, with a single review saying “LGTM”. The other one is extremely messy, and it takes dozens of review comments before it is ready to merge. Currently, the second PR will earn much more cred, because it has much more activity, including many connections to high-cred nodes (the maintainers). Clearly, this is a very bad incentive, because we are rewarding people for doing sloppy work, and creating more work for others.

Another classic example is the bikeshed discussion: suppose that we have two Discourse posts, one of which is framed in a neutral/constructive way, and the other is needlessly inflammatory. The inflammatory one will likely attract more activity, which, in the current construction, means more cred. Again, a very bad incentive!

I think we can approach this by giving SourceCred a way to model how “expensive” a contribution was, which I’m currently calling “resistance”. As a start, we could give every piece of activity some default resistance (maybe proportional to its content length – reading a long PR is more expensive than reading a 1 line typo fix).

Then, we could flow resistance along the cred graph (if your issue attracts a lot of comments, it acquires resistance from its many children, etc), so that every user will wind up with an amount of cred and an amount of resistance.

Getting resistance isn’t bad; it just means that something was costly. (In this sense, it’s a very different meaning from downvotes.) Many parts of SourceCred that I like, like the Mirror module or the Graph, could be considered high resistance.

Some characteristic examples for {high/low} x {cred/resistance} in the context of GitHub:

  • High Cred, High Resistance: The Massive Refactor
  • High Cred, Low Resistance: The Neat Feature
  • Low Cred, High Resistance: The Inflammatory Bikeshed
  • Low Cred, Low Resistance: The Typo Fix

I think this idea still needs a fair bit of development (e.g. is flowing resistance like cred really the right semantic?) but if we can get it right, it will solve a lot of major incentive problems in SourceCred.

I want to think about this, particularly the ‘resistance’ concept a lot more but my experience is that personalized pagerank will do a lot more for you than is intuitively obvious. Consider this reference case:

A canonical pagerank will cause every node in this subgraph to introduce a small amount of cred, and thus the jumble of connections will in fact increase the cred of the PR as described. However, if we are using a personalized pagerank where the source of the cred is the milestone that this PR delivers, then rather than improve the cred of the PR, the cacophony of comments will actually dilute it.

There is a lot more to explore here for sure, but I think the pagerank algorithm we implemented at the Odyssey hackathon is a big step towards being able to combat spurious activity through the focus on our “values” as the sources of cred.

Once the manual mode is merged into sourcecred/Sourcecred, I will construct and run this test cast to demonstrate the impact of personalization on spammy activity.

This is interesting. Thanks.

As a start, we could give every piece of activity some default resistance [then] flow resistance along the cred graph.

To a first-order approximation, then, the current attribution essentially measures cost and calls it “cred”! It also takes reaction votes into account, yes, but those votes are weighted by your cred, which is ultimately a function of how much cost you’ve incurred.

If both “value” and “resistance” should flow along the cred graph, and “score” is roughly “value minus resistance”, then for the score metric to be useful there must be some fundamental difference between value and resistance that is discernible from the graph structure. What is it?

Getting resistance isn’t bad; it just means that something was costly.

Consider the Massive Refactor. It has a high cost, but presumably it decreases the cost of later changes: “make the change easy (this may be hard), then make the easy change”. The later nodes should have arrows back to the Massive Refactor. Do they flow cred to it? Or do they flow “negative resistance”? Why?

I don’t immediately have an answer to this question, because I don’t have an answer to my previous question: what is the fundamental difference between value and (negative) resistance?

Some characteristic examples

This is a nice list.


A few other thoughts:

  • It is important to distinguish between (a) a post that is inflammatory and thus invites bikeshedding and (b) a post that is coöpted by bikeshedding that could not reasonably have been predicted by the original post author. It’s a bad incentive for sloppy, high-cost work to increase your cred, but it’s also a bad incentive for bikeshedding on others’ PRs to decrease their cred.

  • Value is abstract. But resistance is more concrete: it measures a cost in time. Time has the interesting property of being mostly fungible for any given person, but not fungible across people. Perhaps there is something interesting in measuring resistance as “[predicted] value lost due to others’ time investment”.

    Again, this needs refinement. If you submit a mostly reasonable PR, and I spend a lot of time explaining why one minutia is Technically Incorrect, then I should be penalized more than you.

  • You say that a node “acquires resistance from its many children”. Recall that, as currently formulated, the core cred graph is not a DAG; it has no notion of children. Resistance should flow along “has-parent” edges, but not “references” edges (…probably?). This information, however, is domain-specific. Is resistance inherently a domain-specific notion? Probably not, right?

  • I put some effort and time into trimming down this comment to its essentials, in substance and structure. How could that signal possibly make it into the graph?

So this introduces an interesting heuristic: to start, the author of the PR “keeps” the cred associated with it, but if there’s more review activity then the cred “leaks” away to the reviewers. One the one hand, this creates an incentive to write PRs so that they require little review (good). On the other hand, it gives reviewers an incentive to pick apart everything and find lots of changes to suggest (bad) and sets authors and maybe puts authors and reviewers in an adversarial position (also bad).

Getting this stuff right is hard. I think to a real extent we need to just experiment with it and see what kind of behaviors really happen, and how they are affected by the social context.

1 Like

Yeah, I think that’s about right. IMO the current scores (at least at the user level) are mostly an activity metric. Depending on whether you think of activity as cost or benefit, you arrive at the statement above.

I think the difference is that the project will deem certain nodes as root sources of cred (cf Incorporating project goals in cred scoring), aka they are the seed vector for personalized PageRank. So the heuristic is: you get cred for creating activity that the project actually values, whereas you get resistance for activity in general.

Per the reasoning above, it depends on whether the project explicitly valued the later changes. If the later changes were a sequence of additional refactors, never enabling any project-level goals or milestones, then the whole subgraph is a swamp of resistance. Conversely, if the later changes were valued by the project (and blessed by connections to the project’s seed vectors), then the refactor gets cred too.

When I said “acquires resistance from it’s many children” I was specifically referring to a bikeshedding GitHub issue an its comments, so I meant children in the “has-parent” sense. You make a good point about how the semantics for resistance flow are domain-specific, and different from cred-flow semantics. We could imagine having “cred weights” and “resistance weights” on each edge, but that seems quite cumbersome.

  • Making the post more concise likely increases the positive (cred-bearing) engagement signals, e.g. more likely to be quoted, otherwise referenced, or liked.
  • We could imagine having resistance heuristics on posts wherein higher-length posts are higher-resistance.

Having feedback from humans would be most helpful here. Imagine that there’s a downvote analogue that allows people to apply additional resistance (e.g. I would love a mechanism to apply resistance to low-value meetings that I’m invited to). Then the community could apply resistance to the posts that start the bikeshedding rather than to the PR itself.

One thing that is not common in the PageRank literature but totally within our power is to have a seed vector with a uniform component and a personalized component. That is to say, all nodes will induce some cred but weakly compared to the sources identified.

Strongly agree that a lot of experimentation and exploration will be required. What is most effective will depend the community norms a lot.

While I do think adding human feedback (downvoting/upvoting/etc.) is a crucial piece, I want to challenge the assumption that just tracking just the value of activity is bad. In my project (and others I’ve seen), when it comes to code (forums may be a different issue), people that engage in the behavior we’re trying to avoid (bikeshedding, low-quality contributions that drain time from other contributors, etc.) often don’t last long anyway. They can only progress if others continue to help them. In my project, if someone submits their first PR and it requires tons of help, they’ll usually get it. Maybe they do that a second time. But more than that, and they get the cold shoulder. Their comments unanswered, that question in the dev channel (coldest place on earth) goes unanswered. I’ve yet to see anyone last long unless they’re starting to deliver value.

I would caution over engineering here (here being specifically code contributions). If natural human dynamics are already solving a very hard problem IRL, let them continue doing it. Or just give them tools to help automate that process. But making subjective value judgements in code is not only difficult problem, but a slippery slope. It’s very hard, for instance, to tell if that PR with a lot of comments is from a poor coder that needs lots of help, or simply a difficult problem that necessitates a lot of discussion. I can think of a few important PRs off the top of my head just like that.

1 Like

That’s a really astute insight @noman, thank you.

Based on the feedback from this thread, I think it’s clear that this idea is half baked, and solving a problem that we aren’t sure yet exists with SourceCred. So it’s interesting to discuss, and we can definitely back-burner it until it’s clear that it would solve a real need.

2 Likes