A more balanced allocation policy

A grain allocation policy is a procedure for determining how much grain should be given to each contributor, given as inputs the current cred distribution-over-time and the history of past grain allocations. We currently have an “immediate” policy that considers only recent weeks, to quickly reward good work on the project, as well as a “balanced” policy that pays greater amounts to people who are “underpaid”. A person is underpaid if the share of grain that they’ve been paid is less than their current share of cred.

@decentralion and I noted in a discussion that our current “balanced” policy has the property that if cred inflation matches or outpaces grain inflation, the balanced policy actually turns into another short-term policy. In particular, old contributions may not receive any grain, because although new grain is being minted and thus the old contributions are being proportionally underpaid, they’re drowned out by all the new contributions.*

We think that this property is undesirable. If someone makes a valuable contribution early on, and that contribution continues to provide value to the project for years to come, the contributor should continue to earn grain for that contribution.

Furthermore, after significant changes to the weights or graph structure that cause there to be a high concentration of underpaid nodes, the current system will send approximately all payments to those nodes. This is problematic for anyone relying on income streams, which may halt until the distributions have a chance to catch up.

We discussed the idea of a more balanced “balanced” policy that pays people in proportion to how much they’ve been underpaid, as long as this proportion doesn’t veer too far from their current total cred. I commented that I think that this can be done without too much difficulty, with a pointwise correction for overpayment (through some non-linear activation; a sigmoid probably works nicely) followed by normalization.

I’ve now sketched this out. Here’s an interactive notebook: https://observablehq.com/@wchargin/balanced-distribution-sketch

Here’s a screenshot:

A key difference between this policy and the current implementation of the “balanced” policy is that this policy is continuous in both the effect size and the underpayment amounts, whereas the current “balanced” policy has a threshold past which all payments are just zero.

* Note: I haven’t carefully worked out the exact dynamics and necessary conditions here, but this seemed plausible to both of us on the call.

Disclaimer: With the current values of SourceCred on SourceCred, such a policy change would probably benefit me (@wchargin) personally. But, as discussed above, our motivations for this change are not self-serving.

1 Like

I should look more carefully into these parameters and their implications. In particular, my comment that the output distribution tends to the underpayment distribution as k goes to infinity is not correct. There are surely variants on this with different properties both asymptotically and transiently, which we can tune to our liking. Take this primarily as a proof of concept that a better policy like this can exist without too much trouble.

@wchargin I’m not up enough on my maths to comment on the details there, but would you say your proposal is a rough proof-of-concept model for the (new) BALANCED policy proposal as outlined on our brand guidelines page*?

It’s called BALANCED there, but the current BALANCED as we have it today gets renamed to EQUALIZER in this proposal:

* Which I should really move it to Discourse, haha.

Thanks for producing this prototype @wchargin. Feedback on the prototype itself is that the edge cases don’t work and the model is prone to spitting out NaNs, which makes it hard for me to experiment with it to get intuition for the algorithm. (The edge cases are often most instructive…) Also, it would be nice to be able to look at the 2-user case rather than the 4-user case, I think that the 2-user case can still express the interesting dynamics.

Feedback on the algorithm/signature: it would be nice if k has an intuitive significance, e.g. "k indicates the proportion by which someone’s share of the distribution may deviate from their share of the total cred in order to account for underpayment". In that case k=1 implies that we will pay exactly according to long term Cred, k=Infinity implies the policy currently called “BALANCED”, and intermediate values like k=2 or k=4 would be actually “balanced” approaches.

One implementation advantage of an approach that adds a parameter is that we can keep calling the policy “BALANCED” for backwards compatibility, and recommend non-extreme parameter choices like 2 and 4, but port the legacy “BALANCED” policies as having implicitly chosen Infinity as the parameter. (Then we might add in "EQUALIZER" as a moniker for balanced(Infinity).)

Agreed. I think we want to avoid overpaying early contributors, which arguably happens in many startups, where only the founder, VCs and first few employees get outsized returns, and the rest getting crumbs or nothing (happened to me personally). However, you should be rewarded for being early. It’s higher risk, typically more trauma (therapy is expensive), and your contributions often lay the foundation for much of the value created later on.

There’s also an important incentive dynamic here. I’m much more likely to contribute to a new, risky project if there’s the possibility of big returns. If the project says, “well, this complicated algorithm that measures value may stop paying at some point if the project grows fast, and we can’t really estimate how much”, that’s not very motivating…If I get royalties forever, according to a fair valuation of the value I added, even if I eventually get diluted way down to a trickle of Grain, that’s very motivating. If the project truly blows up (e.g. next Google), then that highly diluted stream is still going to be worth a lot. As it should be.

The way I frame this in my own personal morality (not for everyone) is, if you want to never work again, sit on a beach in Thailand sipping Mai Thais, you better be able to look the hardworking Cambodian immigrant that will never retire in the eyes and know you paid for it.

Have been wondering about this issue, glad to see it addressed. The volatility here could be bad enough that it creates competitive, zero-sum dynamics, where people don’t want new contributors in the project just because they can’t risk not paying rent next month.

So, I’m sure this is a very nice algorithm, which addresses a lot of issues in an elegant way. I’m worried the complexity though. It’s already hard enough to explain this stuff to people. And the payout mechanism is something people will want to understand very well before using it for compensation. The educational barrier for many communities may just be too high. Even myself, knowing this system very well, and pretty familiar with math, can’t really estimate how much Grain I’ll be making. Part of that is just that we’re early and prototyping. But considering the complexity of the core Cred calculation algo, I wonder, will anyone be able to understand the whole system well enough to make estimates and income guarantees to their contributors? Also, if something goes wrong, and the Cred scores and Grain distributions seem off, will communities be able to debug it? Or are they reliant on us, who may not be available for that.

What about something much simpler? Pay according to 80/20 lifetime Cred / last week Cred? Lifetime Cred never goes to zero. The ratio could be adjusted by communities to negotiate between old guard and new.

I know there’s this desire to replicate the stability of a traditional job process. We like you, so we’ll start paying you right away a living wage. But I wonder if this causes compromises that make the system less balanced over time. Perhaps this doesn’t need to be addressed in the core algorithm? Maybe communities can just create their own equivalents of PL’s sponsorhip program?

Just some thoughts. Perhaps this algorithm is perfect for SourceCred. Just wondering how suitable it will be for other communities.

Fixed NaNs (safe division; continuous). Added participant-count slider.

Yes, I like this. I’ve sketched out some variants with this property, but I’m not quite happy with them yet. I’ll keep iterating, and I’m also going to add an output view that shows how the allocation changes over time with successive distributions.

Well put. I wholeheartedly agree with your first two paragraphs.

I don’t think that this is meaningfully more complex than our current strategies. We need to be able to explain the concepts and the dynamics to everyone (incl. the mathematically disinclined), but not the formulas. My offhand comment there was for people who do want a high level view of the technical details without reading the code itself. (And, for what it’s worth, the current version of the code is basically one line.)

We need to be able to communicate with mathematical precision in some contexts and general accessibility in others. I don’t think that we should feel compelled to water down the algorithms just because not everyone will understand all the details.

This is, imho, the main point of the k-factor approximation. With that structure, we can say, “look, if you have 1000 cred, and the project has set k = 4, then you’re guaranteed to get at least 250 grain every week”. Of course, there’s still uncertainty in case your cred dilutes dramatically or the policies change. But in the usual case, this provides an easily understandable link between cred and grain.

This works nicely except in the case of large changes to the cred (e.g., weight changes) that may cause people to become underpaid or overpaid, or may correct such a discrepancy.

If you contribute primarily via (say) Discord, and we later realize that the Discord weights are too low, we want to “catch you up” to what you would have earned if the weights had been correct the whole time. This is important because it means that people don’t have to worry too much about arguing about the weights all the time. As long as you’re confident that the weights will be corrected in the future, and you’re not in immediate cash-flow trouble, you’re not penalized in the long run for waiting. Similarly, if you’re negatively affected by a weight change that later gets reverted, you’re caught up as if nothing happened. This should make the social dynamics around weight changes more collaborative and comfortable.

Can this actually be guaranteed? If there is a finite amount of Grain being distributed every Harvest, wouldn’t more newer Participants in the project start earning enough Cred and Grain to slowly diminish this amount?

I like this part and find it intuitive and clear…

…while this leaves me with unanswered questions:

  1. If you have “over-earned” Cred (and subsequently, Grain), wouldn’t this mean that you might be presented with a sudden drop in Grain distributions?
  2. Wouldn’t that cause people with short-term cash flow needs to be hesitant if they don’t know reliably when and how Cred weighting might get adjusted?

Good catch; thank you. More accurately, this should be defined in terms of percentages. The above “guarantee” holds if the amount of grain distributed each equals the amount of cred in the project as of that week. But more generally, it should be: “if you have 10% of the cred, and the project has set k = 4, then you’re guaranteed to get at least 2.5% of the grain every week”. And that we should be able to guarantee.

With more explanation this all sounds good. Will note that a cost of this complexity, even if worth it (which seems the case), is a higher educational barrier for those new to SoruceCred, and more power/trust in those that can understand (and potentially modify) these algorithms. I think I can wrap my head around this (with a few more read throughs :sweat:), and explain the high-level concepts and dynamics in docs, other materials. What we’re really doing though, as with the main Cred algo I suppose, is creating guarantees about claims. None of which will be 100%. And, we’re bound to see those claims and guarantees tested, perhaps in adversarial environments where complexity increases the attack surface. That is the basic tradeoff where making I believe.

1 Like

Realistically, the vast majority of participants aren’t going to understand the implementation of the algorithm. Rather than focusing on having a simple implementation, we should focus on having intuitive behavior. The “BALANCED” policy is simple, but sometimes results in huge and un-intuitive shifts in compensation (go from being paid a lot to being paid 0 due to a weights change). In contrast, the newer policy we’re working on here may be more complex, but I think the behavior will feel more intuitive to our users. (Go from being paid with a large multiplier to being paid with a smaller multiplier due to a weights change.)

It’s kind of like how the iPhone is massively more complex than a terminal, yet for 99% of people it’s easier to use and more intuitive.