On Code Review Cred

decentralion · September 23, 2019, 2:16pm

In the Preliminary CredSperiment Cred mega topic, we had some interesting discussion around valuing code reviews.

@Beanow you bring up some very good points. Conceptually, I think we’re finding that “amount of activity on a PR” is a very bad proxy for “value of the PR”, since lots of activity is more likely to correspond to conversational debugging, controversy, or perhaps bikeshedding–none of which are robust indicators of value.

A better approach may be to have the value of the PR be independent of the activity, but the cred from the PR flows out based on the activity. So a PR that merges without any fuss gives most of the cred to the author; a PR that involves a long conversation prior to merge splits the cred between the participants. Of course this would introduce its own issues (e.g. people incentivized to bikeshed so they can take cred from a PR) but for now I think it would be a better heuristic.

Preliminary CredSperiment Cred

I would like to suggest one bit of nuance to this. I feel like the value of a PR is:

The value of the commits (accepted or not).

Having gone through the collaborative process to improve it’s quality and support base within the community.

Hence I feel like a “no fuss” PR being valued the most does not incentivize the collaboration side of things. PR reviews in particular I feel are valuable, but I do think there is a point where there is a diminishing returns effect. 10 reviews does not add much over 3 reviews and 100 comments doesn’t add much over 10 succinct ones.

So I wouldn’t encourage a system of: each PR is 5 cred, share between all interaction. I would suggest the first 3 reviews and 10 comments are additive. From there on it’s split between interaction or some kind of diminishing returns curve for their value. Incentives wise I feel like this creates space for dedicated / regular reviewing roles. Of course, ideally this would be something you can give weight to suit your community.

@Beanow makes a really good point here. SourceCred should actively support and encourage the collaborative norms of open-source. However, the approach I described originally could have the opposite effect. If each PR is only “worth” a certain amount of cred, and it gets split between the author and reviewers, then authors may resent reviewers for “taking” some of “their” cred. This would be socially harmful. It’s also just “wrong” in that it fails to model the value creation from code reviews. Code reviews both improve the quality of the code, and distributing knowledge about changes to the code, thereby decreasing the project’s bus factor. So good reviews should create more cred, since they create more value.

However, I’m concerned by the idea of just unconditionally giving cred to the reviews, especially giving more cred to the first reviews. It defines a game whose rules are “earn cred by reviewing as many pulls as soon as possible”. Not the best incentives.

One solution could be to have the extra cred of a pull get “unlocked” if the pull request author gives it a positive reaction. Basically, if the pull author gives a or reaction to the review, above and beyond the standard “flow cred from reacting user to thing being reacted to” rule, it would create a special “review appreciated” edge from the pull itself to the review. This special edge would come with some newly minted cred (configurable per weights, of course).

This would change the game from “review as many as fast as possible” to “write reviews that the pull request author explicitly appreciates”. By putting a human in the loop, I think it will mitigate a lot of attacks where people earn cred via spammy or superficial reviews.

Beanow · September 23, 2019, 5:01pm

I don’t think this is a bad incentive. It encourages velocity, as well as digging up stale PRs that haven’t had many reviews. But I agree we shouldn’t encourage a race to be the first with poor quality reviews. Hence I suggest we have multiple “full cred” spaces with a considerable drop off after that. For example 3 reviews full cred and negligible cred when you get to around 7 reviews. With such a large capacity for full cred reviews, any people gaming this with poor reviews should be easily identifiable for moderation, while leaving space for legitimate reviews.

What I dislike about needing a positive response is that people need to be aware of this. Especially analysis of existing projects won’t be great. But otherwise asks for education. Moving this responsibility to the PR author as well leaves the system open for a team of 2 gaming the system. I prefer moderating of people trying to game as that can be a specialist task, like the cred historian, in itself worth cred. While everything is pretty much automatic when there are no bad actors.

Beanow · September 23, 2019, 5:17pm

Also realised, reactions to unlock cred are effectively binary judgement calls of quality. This feels a lot like a black mirror episode where your treatment and opportunities in life depend on the way people rate you.

A linear heuristic or fixed values seem preferable to the can of worms that is actively needing people to rate people.

decentralion · September 23, 2019, 6:59pm

You raise some good points; both that using the system I described creates more cognitive load / need for learned practices by PR authors, and creates potential social tension where the reviewers feel that they deserve a and didn’t get it. Starting with a “default trust” assumption and relying on moderation / social pressure to curb bad behavior makes sense.

It makes me think of @mzargham’s model of three phases of trust in SourceCred. While we are in the “high trust / low adversarial” mode, we can use heuristics that might be brittle at scale, like giving cred to the first pull requests. Then maybe we can skip straight past the “explicit feedback from authors” and towards having impartial curators make necessary judgement calls and adjustments based on the value of the PR.

This will make the behavior of the system more consistent (curators can have a more “standardized” policy and playbook, vs idiosyncracies of the individual pull authors), and moves the cognitive load off of every contributor and onto a specialized bureaucracy.

For right now, I don’t think we even need the “adjust cred based on # of review” heuristic; it’s extremely rare to see a pull with >2 reviews. So to start, we can just give PR reviews a positive weight (the existing behavior). And we can revisit if / when we feel like it’s no longer working.

Beanow · September 24, 2019, 5:11pm

Definitely unlikely to need this for reviews. But may be worth having a look applying this to PR comments. And if not already implemented, decrease value when the PR is long closed / merged.

s_ben · September 25, 2019, 7:03am

Note above quote is actually from @Beanow (which was nested in another quote).

I like the deep thought put into a formula for scoring PR reviews. Part of me likes simplicity generally. But if we can come up with something a little more complex that better captures the average dynamics at play here, we can introduce some objectivity (key to making it resistant to subjective gaming (i.e. corruption)).

I think that and are fairly widely used and typically “socially weighty” signals (though you both have much more experience in OSS than I do). I like the idea of making use of existing signals rich in information.

Already @decentralion, @burrrata and I have noticed our behavior on Discour change due to knowing cred was being assigned. Even before money was involved. If real cred scores, and possibly real money are riding on this, I think it’s reasonable to assume people will become more aware of what their actions mean. I don’t think this is necessarily a bad thing. We are essentially going to have to make a bunch of subjective judgements, continually, for this to work. Making use of “informal voting” that occurs on the social level seems efficient and requires less education than learning some new mechanic. I think we’ll also see this power dynamic no matter what in the form of what gets merged. That is a main metric used in the Decred DAO. I essentially don’t get paid for things that don’t get merged (it’s not that simple, but that signal is important).

This is all black mirror territory:) I think the main difference here will be the transparency of the algorithm and “due process” by your peers, not giant unaccountable corporations.

I see what you mean here. Maybe this is better. Though I think that if you make enough binary choices, they add up to a collection of smooth curves over time.

Humans distributing credit and money will always cause social tension. I think it’s best to not try to increase or decrease what is naturally there, per se, as that information may be necessary to properly negotiate between ourselves.

I definitely think this is going to be a key piece of this.

I’ve been thinking about this idea of impartial curators. The problem is that accurately judging the value of a contribution may require a lot of knowledge specific to the project or task.

I shiver at the word bureaucracy. Makes me think of the current Wikipedia bureaucracy, which is horrible and plagued by corruption scandals. I do think that there are ways to do this with the right incentives though. For instance, calling a random sortition from a group of qualified curators perhaps. Gaming cred could be the kind of thing where a general “rulebook” can be created. And besides, gaming should be one of those, “you know it when you see it” kind of things.

Beanow · September 25, 2019, 2:00pm

Transparency is one aspect. But not all. This particular episode I’m thinking of (I’ll look it up) shows that while the peer votes and algorithm were transparent, people used the votes as a power over others or even retaliation for menial unrelated reasons. It became a game filled with politics and power dynamics, but very impoverished as a “human in the loop” rating system.

Having cred for a piece of work locked behind a binary rating with one person (the PR author) having the ability to unlock it seems ripe for this kind of political / strategic / retaliation voting.

It may lead to some really unhealthy incentives. For example if you depend on sourcecred based income, you may avoid reviewing work by the same author too often. Because the more you review that person’s work, the more power you give that one person to deny your income. And to mitigate risk must stop reviewing. Or worse, incorporate the things they would have suggested in a new PR to try and compete for the cred or evade their voting powers.

For the same reason (I think) many systems that have scoring and voting, a negative vote comes at a cost to the voter as well. In sourcecred terms you may have to expend some cred or mana in order to deny cred for a PR review. Perhaps this act of moderation may be worth cred if it finds a consensus?

s_ben · September 25, 2019, 4:18pm

I did see the episode. It’s a good cautionary tale. Though the whole point of the show is to depict dystopian outcomes. You could could have written a utopian version of the episode:)

To play devil’s advocate, the same mechanism could be used to “freeze out” someone who is genuinely toxic, potentially abusive.

A negative vote coming at a cost makes sense. Perhaps this is solved/mitigated by “cred defenders” or some kind of moderation. Perhaps you have to pay (or put at risk) some cred/mana to propose a reduction to someone’s cred.

Yes, though do acts find consensus, or do humans? Perhaps, an act of moderation is worth cred. Unless participants reach consensus the moderation was unfair and is overtruned, and then it’s worth negative cred. Here, I keep coming back to the problem of collusion/coersion. If a person or group is powerful enough (say they’re the one who reviews your PRs, causing you to get paid (which you’re relying on)), is capricious in their moderation, or just going after someone else playing power games, even subtly, are you going to vote publicly that their moderation was anything but honest, calling them out publicly? Here, I think anonymous voting (the long history of that in electoral democracies I think is a good reference point) or impartial third party (or “second part” (people in the project with enough distance (and power) to be objective), could be useful.

Beanow · September 25, 2019, 5:29pm

For reference the episode I meant is Nosedive (S3, EP1). Of course the point is to paint a dystopia and technically being fiction is less than anecdotal evidence so needs to be taken with a handful of salt.

Just worried a binary cred unlock system has potential to be detrimental this way. As a rule of thumb I think a safe route is to let moderation of bad actors be a dedicated system and focusing on how you can let the base algorithm more accurately quantify the value of contributions, without creating harmful incentives.

I definitely think explicit approvals can improve accuracy, but I believe comes with the risk of creating harmful incentives. While a diminishing returns curve can also create an incentive to race people with low quality contributions to get the high cred slots, it’s a lot less prone to blackmailing and power struggles level of worst case scenarios.

s_ben · September 27, 2019, 2:23am

Generally agree. Would rather not rely on binaries if we can solve this in other ways.

decentralion · September 27, 2019, 9:00am

I should re-watch Nosedive and write a post, something like “On building non-dystopian reputation systems”. I’ve been wanting to build reputation systems for years (since at least 2014), and what stopped me in the past was the expectation that it would turn, like nosedive, into a dystopian system.

IMO, there are a lot of issues in nosedive, e.g. you can rate people without their consent or even any “grounds” to rate them. But the biggest issue by far is that it’s a single unified panopticon style system, rather than a patchwork of community perspectives. This is part of why I want to avoid having centralized cross-community cred instances, and why I think the ability to exit and fork cred is so important.

burrrata · October 4, 2019, 10:29pm

A lot of the problems here are due to spam and a zero-sum model. This addresses the spam, but the game still feels zero-sum. What if this process was “positive-sum” in that it did not re-allocate Cred from the author’s PR, but instead created new cred for the reviewer? (btw not sure if creating a “review appreciated” edge creates new cred or just redistributes existing cred)

Likes and hearts are simple enough that I think the learning curve would be manageable.

Good point! This would only work in specific instances where projects are specifically playing the cred game.

This could be an opportunity for feedback where if you felt you deserved a , but didn’t get it, you could reach out to the author to ask why and how to improve. This would help reviewers and/or help the author fix a mistake or oversight.

This feels like the best way to go. The trick then is to incentivize curators so that they do good work and there’s enough to go around.

This will, imho, result in the game reflecting the views and values of the society that plays the game. In this sense, just like democracy, it only works with informed, engaged, and compassionate players. The game is only as good as the players (assuming players can use cred to vote/influence the alg params)

Burning some of your own cred to slash someone else is kind of what Erasure does (see “griefing”). I’m not the biggest fan of that model, however, as it relies on altruism (taking one for the team) or anger (an eye for an eye).

I like the “cred defenders” idea. Much better to incentivize people to find and flag spam/errors. This works great for shorting stocks, finding bugs in code, and creating cryptoeconomic consensus on blockchains. Seems much more robust than hurting yourself to hurt others.

In addition, perhaps a system could be setup where defenders have to stake some cred to earn the right to fulfill that role, and if they’re found to be abusing their power then their cred is slashed. In the case of an honest (and educated) majority this would work.

Good point! Even if there is an honest majority of cred defenders, it could still not be economically rational to speak out against popular people. In this case, pseudonymous actors, or even anonymous voting, would help immensely.

Looking forward to it!

Power laws and network effects are real. While you can fork, why would you want to. This is a very interesting design space and more research needs to be done to create incentive mechanisms to fork or not to fork. MolochDAO does a decent job with this where “forking”, or leaving the org, actually gives you your money back. This way all participants are incentivized to make decisions that is in the interest of all parties involved, otherwise people will leave. Since money (really Ethereum) has value outside of MolochDAO this works. With social networks, however, your identity/reputation only has value inside that network. Similarly, I can’t directly take my ETH to Bitcoin. I can, however, sell my ETH and buy BTC. Having lots of bridges or markets to trade cred would be important. Then people have a meaningful exit incentive.

this of course creates lots of other problems, but it allows people to exit in a meaningful way

decentralion · October 5, 2019, 5:01am

Zeroing on one note (from a post which had a bunch of great content): I am willing to significantly sacrifice analysis of existing projects in order to make SourceCred perform brilliantly with informed players. The way to change the world is to make something that works brilliantly and then scale it, not to make something that sort of kind of works in a lot of medium use cases.

I think this was my main mistake in my development trajectory for SourceCred over the past year: I focused too well on trying to improve the median use case (couldn’t make it good enough to be relevant) rather than zeroing in on one use case where we can make it sing.

References:

Topic		Replies	Views
Cred, Cost, and 'Resistance' Research	7	1191	April 24, 2019
Weekly Cred Analysis -- Week ending Jan 12 The CredSperiment	3	923	January 15, 2020
Discourse from Day 1 The CredSperiment	4	961	September 4, 2019
Preliminary CredSperiment Cred The CredSperiment	20	3546	October 6, 2019
Building SourceCred for the Wider World (Pt. 2: Get to your god damn point!)	0	999	December 17, 2020

On Code Review Cred

Related topics