A more balanced allocation policy

A grain allocation policy is a procedure for determining how much grain should be given to each contributor, given as inputs the current cred distribution-over-time and the history of past grain allocations. We currently have an “immediate” policy that considers only recent weeks, to quickly reward good work on the project, as well as a “balanced” policy that pays greater amounts to people who are “underpaid”. A person is underpaid if the share of grain that they’ve been paid is less than their current share of cred.

@decentralion and I noted in a discussion that our current “balanced” policy has the property that if cred inflation matches or outpaces grain inflation, the balanced policy actually turns into another short-term policy. In particular, old contributions may not receive any grain, because although new grain is being minted and thus the old contributions are being proportionally underpaid, they’re drowned out by all the new contributions.*

We think that this property is undesirable. If someone makes a valuable contribution early on, and that contribution continues to provide value to the project for years to come, the contributor should continue to earn grain for that contribution.

Furthermore, after significant changes to the weights or graph structure that cause there to be a high concentration of underpaid nodes, the current system will send approximately all payments to those nodes. This is problematic for anyone relying on income streams, which may halt until the distributions have a chance to catch up.

We discussed the idea of a more balanced “balanced” policy that pays people in proportion to how much they’ve been underpaid, as long as this proportion doesn’t veer too far from their current total cred. I commented that I think that this can be done without too much difficulty, with a pointwise correction for overpayment (through some non-linear activation; a sigmoid probably works nicely) followed by normalization.

I’ve now sketched this out. Here’s an interactive notebook: https://observablehq.com/@wchargin/balanced-distribution-sketch

Here’s a screenshot:

A key difference between this policy and the current implementation of the “balanced” policy is that this policy is continuous in both the effect size and the underpayment amounts, whereas the current “balanced” policy has a threshold past which all payments are just zero.

* Note: I haven’t carefully worked out the exact dynamics and necessary conditions here, but this seemed plausible to both of us on the call.

Disclaimer: With the current values of SourceCred on SourceCred, such a policy change would probably benefit me (@wchargin) personally. But, as discussed above, our motivations for this change are not self-serving.

5 Likes

I should look more carefully into these parameters and their implications. In particular, my comment that the output distribution tends to the underpayment distribution as k goes to infinity is not correct. There are surely variants on this with different properties both asymptotically and transiently, which we can tune to our liking. Take this primarily as a proof of concept that a better policy like this can exist without too much trouble.

@wchargin I’m not up enough on my maths to comment on the details there, but would you say your proposal is a rough proof-of-concept model for the (new) BALANCED policy proposal as outlined on our brand guidelines page*?

It’s called BALANCED there, but the current BALANCED as we have it today gets renamed to EQUALIZER in this proposal:

* Which I should really move it to Discourse, haha.

Thanks for producing this prototype @wchargin. Feedback on the prototype itself is that the edge cases don’t work and the model is prone to spitting out NaNs, which makes it hard for me to experiment with it to get intuition for the algorithm. (The edge cases are often most instructive…) Also, it would be nice to be able to look at the 2-user case rather than the 4-user case, I think that the 2-user case can still express the interesting dynamics.

Feedback on the algorithm/signature: it would be nice if k has an intuitive significance, e.g. "k indicates the proportion by which someone’s share of the distribution may deviate from their share of the total cred in order to account for underpayment". In that case k=1 implies that we will pay exactly according to long term Cred, k=Infinity implies the policy currently called “BALANCED”, and intermediate values like k=2 or k=4 would be actually “balanced” approaches.

One implementation advantage of an approach that adds a parameter is that we can keep calling the policy “BALANCED” for backwards compatibility, and recommend non-extreme parameter choices like 2 and 4, but port the legacy “BALANCED” policies as having implicitly chosen Infinity as the parameter. (Then we might add in "EQUALIZER" as a moniker for balanced(Infinity).)

Agreed. I think we want to avoid overpaying early contributors, which arguably happens in many startups, where only the founder, VCs and first few employees get outsized returns, and the rest getting crumbs or nothing (happened to me personally). However, you should be rewarded for being early. It’s higher risk, typically more trauma (therapy is expensive), and your contributions often lay the foundation for much of the value created later on.

There’s also an important incentive dynamic here. I’m much more likely to contribute to a new, risky project if there’s the possibility of big returns. If the project says, “well, this complicated algorithm that measures value may stop paying at some point if the project grows fast, and we can’t really estimate how much”, that’s not very motivating…If I get royalties forever, according to a fair valuation of the value I added, even if I eventually get diluted way down to a trickle of Grain, that’s very motivating. If the project truly blows up (e.g. next Google), then that highly diluted stream is still going to be worth a lot. As it should be.

The way I frame this in my own personal morality (not for everyone) is, if you want to never work again, sit on a beach in Thailand sipping Mai Thais, you better be able to look the hardworking Cambodian immigrant that will never retire in the eyes and know you paid for it.

Have been wondering about this issue, glad to see it addressed. The volatility here could be bad enough that it creates competitive, zero-sum dynamics, where people don’t want new contributors in the project just because they can’t risk not paying rent next month.

So, I’m sure this is a very nice algorithm, which addresses a lot of issues in an elegant way. I’m worried the complexity though. It’s already hard enough to explain this stuff to people. And the payout mechanism is something people will want to understand very well before using it for compensation. The educational barrier for many communities may just be too high. Even myself, knowing this system very well, and pretty familiar with math, can’t really estimate how much Grain I’ll be making. Part of that is just that we’re early and prototyping. But considering the complexity of the core Cred calculation algo, I wonder, will anyone be able to understand the whole system well enough to make estimates and income guarantees to their contributors? Also, if something goes wrong, and the Cred scores and Grain distributions seem off, will communities be able to debug it? Or are they reliant on us, who may not be available for that.

What about something much simpler? Pay according to 80/20 lifetime Cred / last week Cred? Lifetime Cred never goes to zero. The ratio could be adjusted by communities to negotiate between old guard and new.

I know there’s this desire to replicate the stability of a traditional job process. We like you, so we’ll start paying you right away a living wage. But I wonder if this causes compromises that make the system less balanced over time. Perhaps this doesn’t need to be addressed in the core algorithm? Maybe communities can just create their own equivalents of PL’s sponsorhip program?

Just some thoughts. Perhaps this algorithm is perfect for SourceCred. Just wondering how suitable it will be for other communities.

1 Like

Fixed NaNs (safe division; continuous). Added participant-count slider.

Yes, I like this. I’ve sketched out some variants with this property, but I’m not quite happy with them yet. I’ll keep iterating, and I’m also going to add an output view that shows how the allocation changes over time with successive distributions.

Well put. I wholeheartedly agree with your first two paragraphs.

I don’t think that this is meaningfully more complex than our current strategies. We need to be able to explain the concepts and the dynamics to everyone (incl. the mathematically disinclined), but not the formulas. My offhand comment there was for people who do want a high level view of the technical details without reading the code itself. (And, for what it’s worth, the current version of the code is basically one line.)

We need to be able to communicate with mathematical precision in some contexts and general accessibility in others. I don’t think that we should feel compelled to water down the algorithms just because not everyone will understand all the details.

This is, imho, the main point of the k-factor approximation. With that structure, we can say, “look, if you have 1000 cred, and the project has set k = 4, then you’re guaranteed to get at least 250 grain every week”. Of course, there’s still uncertainty in case your cred dilutes dramatically or the policies change. But in the usual case, this provides an easily understandable link between cred and grain.

This works nicely except in the case of large changes to the cred (e.g., weight changes) that may cause people to become underpaid or overpaid, or may correct such a discrepancy.

If you contribute primarily via (say) Discord, and we later realize that the Discord weights are too low, we want to “catch you up” to what you would have earned if the weights had been correct the whole time. This is important because it means that people don’t have to worry too much about arguing about the weights all the time. As long as you’re confident that the weights will be corrected in the future, and you’re not in immediate cash-flow trouble, you’re not penalized in the long run for waiting. Similarly, if you’re negatively affected by a weight change that later gets reverted, you’re caught up as if nothing happened. This should make the social dynamics around weight changes more collaborative and comfortable.

Can this actually be guaranteed? If there is a finite amount of Grain being distributed every Harvest, wouldn’t more newer Participants in the project start earning enough Cred and Grain to slowly diminish this amount?

I like this part and find it intuitive and clear…

…while this leaves me with unanswered questions:

  1. If you have “over-earned” Cred (and subsequently, Grain), wouldn’t this mean that you might be presented with a sudden drop in Grain distributions?
  2. Wouldn’t that cause people with short-term cash flow needs to be hesitant if they don’t know reliably when and how Cred weighting might get adjusted?

Good catch; thank you. More accurately, this should be defined in terms of percentages. The above “guarantee” holds if the amount of grain distributed each equals the amount of cred in the project as of that week. But more generally, it should be: “if you have 10% of the cred, and the project has set k = 4, then you’re guaranteed to get at least 2.5% of the grain every week”. And that we should be able to guarantee.

With more explanation this all sounds good. Will note that a cost of this complexity, even if worth it (which seems the case), is a higher educational barrier for those new to SoruceCred, and more power/trust in those that can understand (and potentially modify) these algorithms. I think I can wrap my head around this (with a few more read throughs :sweat:), and explain the high-level concepts and dynamics in docs, other materials. What we’re really doing though, as with the main Cred algo I suppose, is creating guarantees about claims. None of which will be 100%. And, we’re bound to see those claims and guarantees tested, perhaps in adversarial environments where complexity increases the attack surface. That is the basic tradeoff where making I believe.

1 Like

Realistically, the vast majority of participants aren’t going to understand the implementation of the algorithm. Rather than focusing on having a simple implementation, we should focus on having intuitive behavior. The “BALANCED” policy is simple, but sometimes results in huge and un-intuitive shifts in compensation (go from being paid a lot to being paid 0 due to a weights change). In contrast, the newer policy we’re working on here may be more complex, but I think the behavior will feel more intuitive to our users. (Go from being paid with a large multiplier to being paid with a smaller multiplier due to a weights change.)

It’s kind of like how the iPhone is massively more complex than a terminal, yet for 99% of people it’s easier to use and more intuitive.

3 Likes

So this is the basis of this thread and seems really hand-wavy. The thread feels in the weeds about implementing a change, without really justifying the change. It was discussed the importance of communicating the abstracted semantics, but I don’t feel that that has been done sufficiently here to make this discussion (which has a broad impact) accessible.

new grain is being minted and thus the old contributions are being proportionally underpaid

but they aren’t being proportionately underpaid… right? Or is the issue that the balanced algorithm is greedy and pays the most underpaid fully first?

If someone makes a valuable contribution early on, and that contribution continues to provide value to the project for years to come, the contributor should continue to earn grain for that contribution.

Why is it desirable for old contributors to be guaranteed income if the cred inflation is outpacing the grain inflation? It seems to me like if we want old contributions to get credit for future contributions, or if we want early contributions to be weighted higher, we should solve that through the graph algorithm. An “equalizer” quality to distribution seems desirable and value aligned. I wouldn’t feel good investing critical dev effort into a change like this without it being better justified.

If a weight change occurs and I get paid nothing despite actively contributing, that’s because I was overpaid and I’m working off the debt. If that creates unacceptable income instability for active contributors, then combining balanced with immediate or recent seems like the obvious solution.

1 Like

My issue with the “old” balanced policy isn’t the issue of old contributions going un-paid. Rather, it’s that the balanced policy is really high volatility; small shifts in Cred result in giant shifts in compensation. Let’s look at this through some practical examples. Suppose we’re looking at a case where (very much like SC presently) there’s about 1M total Grain distributed, and we distribute 15,000 per week using the BALANCED policy.

Suppose we decide that (effective retroactively) we want to give 5% of the total Cred to co-communities like Maker, MetaGame, and 1hive, to recognize the value they provide by using SourceCred (and paying us). Then, from the perspective of the “balanced” algorithm, they’d be under-paid to the tune of 50,000 Grain – more than 3 full allocations of our current balanced policy. Thus, for 3 weeks, everyone else’s balanced income would abruptly drop towards 0 while the policy tried to catch up. In effect, the policy has levered a 5% change of Cred into 300% of a standard allocation.

This also creates a big reward for short-term attacks against the Cred scores. For example, suppose a clever attacker launches a bot attack just before a Grain distribution, and manages to claim 0.5% of the Cred. In the example above, they are now underpaid by 5,000 Grain, and are able to leverage a 0.5% Cred share into grabbing 33% of our weekly balanced distribution.

Under the semantics of the proposed replacement strategy, the fraction of the payout that I can earn is capped at a multiple of my lifetime Cred proportion. Suppose that we set m=3, i.e. people who are underpaid can get, at most, 3x their proportion of total Cred as a fraction of the total payout. In the example where we retroactively mint 5% for co-communities, those co-communities will get 15% of the total distribution for a while; significant, but it no longer causes a massive drop in everyone else’s comp. Likewise, the attacker who spikes their Cred to 0.5% can now get 1.5% of the weekly payout, or 225 Grain. No longer a giant issue.

We’ve chatted a bunch about the need for SC mechanisms to be interpretable and intuitive. The semantics of the existing mechanism are: “your share of the balanced payout may fluctuate wildly week to week, based on variables that are mostly outside of your control”. The semantics of the proposed new mechanism are: “your share of the balanced payout will usually be about equal to your share of the total cred, but it may vary by up to 3x depending on if you are overpaid compared to others”. Thus, I think this would also be a big improvement towards making SC comp more stable and easier to forecast.


To see some real examples, look at how much volatility there’s been in the last 3 balanced distributions:

11/30:

11/23: 11/14:

Just to use myself as an example, in 11/14 I get 5.5k and then in 11/23 I get 254. Why did that happen? I genuinely don’t know. Several other ppl see their comp bounce up and down by factors of 10x. The balanced algorithm is spooky.


Takeaways: Thanks for posting asking for more justification @blueridger, I surprised myself with how strong a case against the BALANCED algorithm I was able to make. @Bex has been poking me lately asking why Grain comp fluctuates in an unpredictable way that seems unconnected to people’s contributions, and I think the BALANCED payouts are the issue. In a formal sense, I think the BALANCED strategy is an under-damped control mechanisms, and by virtue of being under-damped it induces oscillatory behavior that has little to do with the actual Cred scores.

I propose we get a solid replacement, and call it the “lifetime” mechanism. The rule of the lifetime mechanism is that it pays everyone proportional to their lifetime Cred, with a configurable adjustment factor based on over/under-payment. Then between “recent” and “lifetime” payments, we have a really nice way to dial between paying salaries to recent contributors to keep the ship afloat, and paying lifetime amounts for a rising tide that lifts every boat.

3 Likes

Ok, I think the need to tweak something is fairly well justified, but I’m still a little bewildered by the lack of moral justification / semantic explanation provided for the alternatives.

A combination of “lifetime” and “recent” for example, would necessarily mean that old contributors who are no longer active but who are retroactively acknowledged as underpaid by a config change would, if my mental math is correct, literally NEVER achieve a % of total grain representative of their % of total cred. This is because the recent policy wouldn’t pay for old contributions, and a lifetime policy would make their % of total grain approach their % of total cred as time goes to infinity–their fair share would be asymptotic.

One glaring issue that I see with a simple hard cap like the one described above is that in a mature community that is primarily maintained by non-founding contributors, it could prevent the new maintainers from getting their fair share and instead continuously over pay old inactive cred whales.

I get that the current balanced policy is volatile, but it currently is the only mechanism that represents the ethic that people’s % of total grain should ideally be their % of total cred, and that it should balance towards that as quickly as the community can afford. This is an AWESOME ethic, that I think our new policy should maintain as best as possible.

I have two ideas off the top of my head for improved balanced policies. I’m going to shift the language in my discussion of them to center underpaid contributors, which includes active contributors, with the term “debts”. I will describe the effect of new grain being paid out proportional to total cred as “dividends”. I will describe the “continuous” non-zero distribution effect described by @wchargin as “non-neglectful”.

First, a Recency-Biased Balanced Policy.

This policy would use the new CredGrainView technology we are currently building to place debts in time, and then have some sort of slight (non-neglectful) bias towards more recent debts. Evaluation:

  • [YES] Active contributors would have improved income stability.
  • [POSSIBLY] Historically overpaid active contributors would have improved income stability.
  • [YES] Retroactive adjustments would generally get fixed in a finite amount of time.
  • [YES] All debts would be paid before dividends begin.
  • [NO] Mitigates the effect of short-term attacks on other debts being paid.
  • [NO] Short-term attacks are disincentivized.

Second, a Soft-Cap Balanced Policy.

This policy would cap the % of the payout someone can earn at a multiple of their % of total cred, as described in above comments, but may conditionally lift the cap according to this algorithm:

  1. Debts are paid with cap applied.
  2. If there is grain left, debts are paid without cap applied. (Or a smarter version: iterate with increasingly high caps until all debts are paid).
  3. If there is grain left, dividends are paid.

It would also be non-neglectful when paying debts. Evaluation:

  • [YES] Active contributors would have improved income stability.
  • [NO] Historically overpaid active contributors would have improved income stability.
  • [YES] Retroactive adjustments would generally get fixed in a finite amount of time.
  • [YES] All debts would be paid before dividends begin.
  • [YES] Mitigates the effect of short-term attacks on other debts being paid.
  • [MOSTLY] Short-term attacks are disincentivized.

Conclusions

  • Let’s get clear on what our goals are and what will fulfill those goals. I’ve offered my suggestions for goals in the above evaluations.
  • These two suggested policies could probably be merged and applied in a single super smart policy.
  • I think we have consensus that recency is important for active contributor stability. That could come from something like the Recency-Biased Balanced Policy or from pairing the Balanced policy with the Recent policy.
3 Likes

Thanks for the detail in your posts @blueridger and @decentralion. Watching this discussion and having pondered the Balanced policy myself, I have some high level thoughts.

I think we all agree on @decentralion’s case for why the policy needs changing, and I think that @blueridger hits the nail on the head in regards to the conversation so far being a bit out of touch with normal, intuitive semantics.

I suggest that we ditch any thinking about how to implement the new policy, and first gain strong consensus around what we want to build (and more importantly IMO: what we don’t want Balanced to do).

I suggest we include more of the non-dev community to (1) force us to perfect the semantics and rationale and (2) hold us off from talking implementation until we have strong consensus.

This is a complicated topic with a lot of perspectives all which contain wisdom/information we should aggregate. This isn’t usually my style, but I think it could be useful to do a session on Miro where we source thoughts/concerns/info from the community.

1 Like

Perhaps this is a second candidate for a Design Sprint! More reason for me to get going on my tasks to document and justify that process.

I’m generally in agreement with all you wrote, @blueridger, but this paragraph confuses me. What you’re describing is what is currently happening—BALANCED is occasionally rewarding a lot of Grain to people who’ve not been active in 6+ months simply because their lifetime Cred is still a pretty high % overall, from a time when far fewer people were involved in the project.

I do feel like your two alternative Balanced policy proposals would do a better job with this, though.

I want to push back on this a little. I think we’re better off handling prioritizing recency by using Immediate and Recent policies (i.e. shifting more of the grain budget to them). I see the role of Balanced if anything as having a bias towards older contributors.

Are we trying to do to much in one policy?

I also wanted to suggest that the best “Balanced” policy might be in the ensemble of policies that are used together along with their parameters, rather than trying to build the perfect Balanced policy. Good Recent and Immediate policies will keep the current contributors energized, and policies like Balanced (which is more like “Underpaid”) tries to play catch up. My take is many simple policies that can be aggregated differently for different communities/values/goals would be the most antifragile way to go. Rather than dilly dallying over this, we can just build a few variants and try them out.

1 Like

I agree with @eeli that a combination of policies is probably a cleaner and more effective approach. Perhaps in our current implementation, we are simply over-distributing Grain in the Balanced policy and should shift it more towards something like what @befitsandpiper mentioned, 1Hive’s 8/25 split favoring Recent over Balanced.

I sketched out a simple variant of Balanced called “Underpaid” in an effort to (1) make sure that large grain underpayment compensations occur over time instead of eating everyone’s lunch for a few weeks and (2) bias towards mid-tier contributors.

You can checkout some examples in relation to the current Balanced policy here.

Essentially it:

  1. Takes the N-th root (we choose N, deciding how progressive of a “tax”) of the amount of grain owed to each contributor.
  2. Filters out contributors below some threshold T (which we can decide).
Explaining (1)

By taking the N-th root of how underpaid someone is, we’re scaling down the biggest underpayments relative to everyone else. In English, it’s “we want to pay you that, but we have a lot of other people that need to be paid”. Simple enough.

Explaining (2)

The threshold says “pay people with at least T grain”. When we using quadratic scaling, a large part of the budget begins to get eaten up by many small contributors. You can see an example here where half the budget gets eaten up by small contributors. These smaller contributors are not neglected, but simply will get paid once they have a non-negligible amount. This helps us focus the budget on catching up more engaged and previously engaged contributors.

Note that this also would be more dramatic in the case of many real communities, which will have many many people peripherally participating or receiving minor grain payments.

Working out the threshold is another conversation in itself, but I just want to give intuition behind the decision here. I recommend checking out the google sheet with examples to get a sense of how it works.

I recently learned the concept of a PID Controller; there’s a really nice explanation of how they work in section 4.1 of the RAI whitepaper. I think we should probably just use a PID controller as a replacement for the “underpaid” strategy (which is itself just a P-controller, i.e. with 0 weight on the integral or the derivative).

No need to re-invent a wheel here. :slight_smile:

1 Like