Minting Discourse Cred on Likes, not Posts

decentralion · January 23, 2020, 7:56pm

Currently, we mint Discourse cred for activity; every topic and post creates new cred. I recently threw together a prototype which allows us to mint cred based on likes instead of posts. This post explores how this changes our cred.

In keeping with the practices we came up with during the CredSperiment weight changes thread, I’m not going to mention any individuals by name. However, the analysis happens at the level of individuals, so you can argue this is just a fig leaf. Let me know if you see a better way to handle this discussion, and, as always, please be respectful and considerate towards your fellow cred-tributors.

First, I’ll link the two prototypes of how our (Discourse-only) cred looks with this change. We have:

The first thing to note is that the default cred graph is a lot “spikier”:

whereas the cred-on-likes graph stays a lot more consistent over time:

Basically what we’re seeing here is that when there is a big spike in activity (more posts and topics), it doesn’t necessarily result in a spike of likes, so the second approach is a little more “conservative” in cred generation. (Although, counterfactually: someone could create new accounts and then like hundreds of posts, which would create a cred spike of its own.)

Diving deeper, we can look at how the cred distributions compare at the individual level, alongside statistics on number of topics, posts, and likes received. Here they are for the top 8 Discourse contributors by cred (names elided):

User	Default Cred %	Likes Cred %	Cred Change	Topics %	Posts %	Likes %	Likes per Post
1	35%	38%	7%	39%	29%	42%	1.3
2	31%	23%	-26%	34%	32%	16%	0.5
3	13%	15%	12%	13%	17%	16%	0.9
4	10%	12%	21%	7%	13%	13%	0.9
5	4%	5%	30%	4%	1%	4%	2.7
6	3%	4%	22%	2%	3%	4%	1.3
7	3%	3%	1%	2%	2%	2%	0.6
8	2%	2%	16%	1%	2%	2%	1.1
sum or average	100%	100%		100%	100%	100%	0.9

Basically, the big shift is that a prolific contributor who made up a large share of the raw activity, but a smaller share of the likes (as seen by their having a likes-per-post that was about half of the average). Changing to cred-on-like brings their cred share closer to their share of likes received; this shifts cred towards the other contributors, as seen by everyone else having a positive cred change.

Overall, I think this is a constructive change to cred. It brings the principle of review culture to our Discourse cred; a like is an implicit, low-friction form of review. This is potentially vulnerable to gaming; for example, someone could register a dozen new accounts and use them to mint a lot of new cred. However, Discourse already has a built in concept of trust levels that we could apply here. We could adjust the heuristic so that, for example:

Trust Level	Minted cred per like
0	0
1	0
2	1
3	2
4	3

Note that likes by L0 or L1 users would still flow cred, they just wouldn’t mint any cred. If you read over the docs linked above, you’ll see that getting to L2 (the first level that can mint any cred) requires visiting on at least 15 separate days, spending an hour reading posts, receiving 1 like, etc. I think this is a sufficient barrier to deter gaming, at least for our current community trust level.

Beanow · January 23, 2020, 8:17pm

Random idea, maybe the total Cred per Topic + included Posts could be cool to look at. Bonus points for sorting the Topics by total Likes on that topic + posts.

It isn’t ideal, but it does shift from needing a mental image of a persons’ expected value vs of a topic’s expected value. And the later is easier to verify :]

Beanow · January 23, 2020, 8:57pm

Very interesting! I find myself wanting to dig some more, but have difficulty achieving this. I think this should be possible with cred.json but sublime isn’t too happy with me editing a 221k lines re-indented json.

Applying different canonical weights (found some I like in the explorer).
Limiting timeline to a certain interval.

Please hold … * elevator music *

decentralion · January 23, 2020, 9:15pm

Please be aware that the cred.json file is currently filtered so that it only includes the 100 top scoring nodes of each type (across all time). This means if you e.g. filter down to a particular interval, you may get extremely unrepresentative results, because the top nodes from that interval might not have been top across all time.

I’ve created a branch which disables this behavior. I used this branch for the cred analysis I published last week. If you want to do an ad hoc analysis, I recommend using that notebook as a starting point, and I recommend re-computing the cred with the no-reduce-size commit added. (I’m planning to PR that commit to master for just this reason.)

Beanow · January 23, 2020, 10:47pm

So I managed to get something working! https://github.com/Beanow/discourse-likes-comparison

Full timeline instances

Using canonical weights

First, I noticed the baseline weights were the default weights, not our canonical cred. Note I deviated from our canonical cred on one aspect, in that I removed specific node weights. A few topics are manually increased in our canonical, I removed that here. (All weights in the repo)

Stronger likes weights

An alternative to the PR, is to try and increase the edge weights of likes (from 2x to 32x). I’ve also flipped Topic and Post weights around from (Tx8, Px2) to (Tx2, Px4).

Mixed likes and activity

Then I wondered, what about both activity-Cred (topics/posts) and like-Cred? And fiddled with weights again. Here I did bring down the activity cred a lot. But isn’t completely gone.

All likes cred

Finally, made one that has all activity cred nodes to 0x. But is still different from your reference, in that our canonical cred had tweaks to other edge weights.

Cropped timeline instances

I noticed in that it was very hard to see what the differences were before December 2019, because that has the peak in it. It would make everything smaller, so I couldn’t see what besides the peak changed.

Here’s variations of the same weights, but limited to pre-December.

Edit: found out these don’t allow recalculating without bringing the rest of the history back , sorry, out of time to fix today.

Beanow · January 23, 2020, 11:11pm

Something that stands out to me from tinkering with these variations.

Initial forum setup

At the very start of the graph. Creating the forums takes a similar big hit, because there weren’t any users to like anything

“Activity cred; Using canonical weights” || “Likes cred; All likes cred”

Early adoption

Very similar to the initial setup. The early periods of using the forums takes a big dive. All the way until about September-ish.

“Activity cred; Using canonical weights” || “Likes cred; All likes cred”

Having them be not 0-Cred

Both of these benefit a lot from the “Likes cred; Mixed likes and activity” in the sense they’re now not completely derived of Cred.

“Likes cred; Mixed likes and activity” || “Likes cred; All likes cred”

brianlitwin · January 24, 2020, 4:52pm

I appreciate how @decentralion explained that the new distribution aligned with the “Likes per Post” variable, which seems like a positive property for the algo to reflect.

One of the risks I see in iterating on the algo is updating it to reflect outputs we’d prefer, instead of updating it to reflect patterns or properties we’ve identified. For example, if the result of this experiment had produced some off-putting result, such as consolidating the Cred in one User, there would still be a strong argument that we should apply the change, and the discussion would be on the grounds of whether “Likes” are good Cred signal.

On the topic of whether “Likes” are good signal, I see something asymmetrical in the amount of effort it takes to create a post, versus the amount of effort it takes to Like a post. It’s a really weak form of engagement. Imo, Post authors aiming for Likes is pretty poor incentive. Reservations aside, as part of the Credsperiment it feels like a solid, incremental improvement.

s_ben · January 25, 2020, 7:15am

Like minting seems to reflect reality a bit better IMO. Also really digging the analysis and visualizations!

As for the Discourse trust levels, I think they are well designed moderation tools (i.e. for permissions to do things in Discourse). And therefore could improve the quality of inputs (e.g. reduce spam). Not minting cred because of a low trust level could also make sense. But wary of each level minting more cred. Could concentrate power to those at the higher levels. I think this will happen naturally anyway in most projects, as some contributors will create more value. No need to amplify this unless we see a need for it in practice.

Beanow · January 25, 2020, 6:47pm

Additional experiments aside, a more opinionated take from me.

The graph visualization can be misleading. Because it only shows the top 6 all-time Cred users. All other users were in my blind spot as I used it to get an intuition for what the change does. For me that’s a big caveat, so I’m treating the results as back-of-the-envolope grade, rather than a proper research.

From a theoretical point, Likes seem to have it’s pro’s and cons. A big pro seems to be that in a high trust environment I’m expecting a better signal to noise*, in that Likes will identify some of the high value contributions better. A big downside I’m worried about is confirmation bias (e.g. conforming to the local memes to get more likes).

Another downside I’m expecting is, having a not as good “permissionless” support compared to activity minting, because now unnoticed efforts will not have a basic amount of Cred. The current system is more forgiving in that respect. Without likes you have some Cred, with likes you have more Cred.

Neither approach is gaming-proofed and both are just a proxy for value, not direct measure of value. So I have the same reservations as @brianlitwin that fitting closer to the Likes signal doesn’t mean it’s safe to conclude that’s a better Cred evaluation.

Also feel that “dampening the peak” is not an indicator we should focus on. There seems to be a consensus that it’s relatively highly valued. So it’s tempting to consider it’s relative reduction an improvement. Especially when it’s this prominent in the visualization. But it’s a single scenario, while we could analyze a few dozen Initiatives as well. I would rather see if the change is an improvement to a majority of those, and apply a bandaid (e.g. manual weights) if this scenario warrants it, than the other way around.

On the positive side though, I feel like the version where I tried to increase Like edge weights as high as 32x was hardly different from the canonical weights version. So the code change to support minting on Likes adds a much more powerful new lever.

Bottom line for me

We are conducting an experiment after all, so I’m happy to try it out. The analysis isn’t very strong atm, but the added capability seems useful.

Until better research proves me wrong, I have a strong preference to keep activity-minting a non-trivial amount as well. (More than 1/2x and 1/4x). Because I believe the permissionless, unnoticed contributions are essential, which we should provide a solid source of Cred for.

*

decentralion · January 25, 2020, 8:54pm

This is an interesting point. In general, I don’t think that we should be aiming for symmetry in effort to create something vs effort to evaluate it. As an example, for a critic to review a film takes orders of magnitude less effort than creating the film; I don’t think this is an indictment of film criticism. In computer science, lots of algorithms have the property that you can quickly validate the quality of a solution; this is considered a Very Good Thing.

Aiming for likes is an imperfect incentive, but I think it’s clearly better than post authors aiming for raw # of posts.

This is a good point; thanks for raising it.

As the trust level decreases, I think the value of a like decays more slowly than the value of a raw post. A very simplistic framing: getting low-quality posts requires one noisy actor, getting likes on low-quality posts requires two noisy actors. So I think Likes are more robust in the low-trust regime. (Far from perfect–but an incremental improvement.)

It’s important that we recognize and reward valuable effort that might go un-noticed. The key word is “valuable”; a lot of unnoticed or unacknowledged effort is likely to go unnoticed or unacknowledged because it isn’t valuable. Therefore, the solution is not to mint cred for all effort, but to get better practices and incentives for noticing the valuable effort. An example of this would be having a weekly community cred review, during which we create manual nodes (and boost/like them) that represent effort that wasn’t represented.

FWIW, this change was originally motivated by an engagement with a partner project; feedback on the initial cred distribution was that people who generate a lot of noise but little value were getting too much cred. Feedback on this change (as applied to their project) was positive. So this is at least n=2 for this change seeming like an

Bottom line for me: I think this change is a good one, both empirically (across two different projects) and theoretically. A number of potential reservations around using likes as a signal have been raised, and they’re good points–however, I think the corresponding weaknesses of minting on raw activity are even stronger.

It’s interesting to imagine making this change in reverse: suppose we had been minting cred on likes the whole time, and now someone proposed minting based on raw # of posts and topics instead. Would anyone advocate for that change?

As TBD, I’m inclined to go ahead with this change, but I’m going to leave this discussion open a little longer so that more people have a chance to chime in and participate. Also, I want to see if I can get a cleaner implementation (if you look at the review for the prototype, you’ll see that this implementation is pretty hacky).

Beanow · January 26, 2020, 4:37pm

I definitely disagree with this. Other reasons they can go unacknowledged is because they are low-profile but good value. Or because posts have a niche audience. Or because posts are part of a ritual (lacking novelty to get likes). Or you’re new to the project, so you don’t have the clout to have people read your post.

The medium also matters. For instance GitHub usage in our community is niche by default. Minting on GitHub reactions only would deflate many great contributions there.

This is why I think Likes are good at signalling some types of high value content. But certainly not all.

In this regard @s_ben has brought up this notion that no contribution should be 0-Cred. And I think this is a case where it very much applies. The mixed activity-minting + likes-minting satisfies that, while still making use of some of the nice properties of likes-minting.

That said, this is technically a weights discussion. The PR would allow both likes-minting only and mixed minting. So I’m in favor of making the change. But think mixed minting weights would be better.

s_ben · January 26, 2020, 8:43pm

While I think like minting is a step in the right direction (or at least worth exploring), I share @Beanow’s concerns here. In particular the bandwidth issue. I can barely keep up with Discourse alone these days (taking a break from my Sunday to drop in and write this because I think it’s important, but I’m sure I’m missing other things), and we’re still a relatively small group. We could easily see a situation where people’s efforts just go unnoticed and they give up before getting meaningful engagement.

There is also the problem of collusion, which we’ve acknowledged is the main attack vector for using likes. It’s easy to imagine a situation where a “cred elite” quickly forms, and uses not liking to gatekeep for profit, excluding people. In fact, between researching my article on reputation systems and DAOs and my experience so far in them, I’ve come to the conclusion that all the meatspace problems (gossiping, clicqes, coercive power relations, etc.) not only show up in virtual spaces, but are often amplified. When reputation systems based on likes are scaled out to millions (Twitter, Facebook, IG, etc.), I think we can all agree some nasty dyamics develop, including winner-take-all markets where the average contributor has little to no real power. We also want to protect “dissident” voices. If an unpopular opinion or idea is raised, we don’t want them to have their income censored (the most effective way to silence people). What if a female in a predominantly male Cred instance started voicing concerns about sexism. Would this system allow them the male leadership to effectively reduce the woman’s income to zero by not liking their contributions? Even if the chance of that happening is slim, the perception that it might could lead to unnecessary self-censoring and yes-manning/womaning/personing.

That said, I still think likes is an important signal to add. Likes seem like the best way for groups to reach consensus on individual contributions (we can’t, after all, have a community-wide vote on the value of every little contribution). There’s a reason Twitter/Facebook/IG/Google have taken over the world (and in an indirect way funded SourceCred via employment I might add). An example I draw inspiration from is Crypto Twitter (CT). Much as people bitch about it, I’ve come to view is as a revolutionary way to reach consensus on scales never seen before. It’s a consensus-reaching machine. In fact, some argue many crypto projects that lack formal governance (e.g. Bitcoin) basically are governed on the social layer, largely on Twitter. There’s some truth to that IMO. I suspect even that one reason democracies are cracking is that social media has become the main governance mechanism, and we’re just slowly waking up to that. But I digress…

I also think we can learn from other systems how to mitigate the downsides and toxicity of like-based consensus. For instance, Twitter’s main defense against unhealthy dynamics (that doesn’t use proprietary close-source AI) is blocking/muting/unfollowing and human moderation. They also just started research into decentralizing its network, allowing individuals or smaller groups to determine their own feed rules (a strategy discussed for SC in a community call a couple weeks ago (and elsewhere)). Smaller SC communities, or teams within larger communities making their own custom rules could be an effective way to mitigate these issues. We could even end up front-run these larger companies by being able to iterate faster.

To elaborate on this, a longer, less “mem’y” definition of this idea for me is, “No contribution that the community sees valuable over time should not have a node in the graph (w/ positive score)”. The danger we want to avoid here is, by rewarding everything, we reward undesirable behavior. For instance, a well-meaning but low skill contributor comes in and starts generating a high volume of posts, which are not appreciated and a distraction to other contributors. The community is friendly and welcoming (maybe), but the contributor isn’t willing or able to conform to the community’s norms. They don’t take cues on the social layer, and just keep firehosing the community with mini manifestos, slowing down work. They’re not explicitly breaking any rules that would cause a ban (which should be generally reserved for extreme cases). The community doesn’t want to be mean. In this case, if the contributor is being paid for activity, they will only produce more posts. Another example could be a contributor that is contributing valuable work, but violating the code of conduct. In many (if not most) situations, it won’t be an extreme case, where someone is straight up hateful for trolling. It could be a situation where some view the person as problematic, and others don’t.

One helpful distinction IMO is: you can’t use people’s work without flowing them cred. If, in the messy grey areas and politics of a Cred instance, a contributor is pushed out of the graph (which will be necessary at times), the community can’t turn around and use their work without crediting them (which, on principle, should flow a non-zero amount of cred). This serves as a pragmatic signal to distinguish nodes that should legitimately be 0, and those that have been treated unfairly. You can’t reject the person and use their work without compensation. This ties back into "Cred Historians" and Curators, which may be something to prioritize if we purse like-based cred.

I think having some amount of activity minting (or at least having it in the codebase so we have optionality around adding it later) is a good way to mitigate a lot of these issues, and gives us tools that could be valuable as we iterate and learn. Mixed activity/like minting cred (which is the PR now as I understand it), seems like a good way forward.

decentralion · January 28, 2020, 1:53am

I’d like to more clearly explain why I want to move away from minting cred directly to posts, in favor of minting on likes.

SourceCred is already built around the assumption that every contribution is potentially valuable. This is the whole idea of cred. Any contribution could get depended on, referenced, or boosted, in which case it will receive cred. This is the core idea of the SourceCred algorithm.

Cred minting is different. Minting is a declaration, “we believe this thing is intrinsically valuable, and therefore we will make it a source of cred”.

Right now, we mint cred to posts. Thus, we intrinsically assume that every post is intrinsically valuable, and the author should get directly rewarded, regardless of any other signal.

This is simply not a realistic assumption. It is not true that every single post on the internet is valuable. Some posts are pure noise (less than 0 value due to friction), others are actively inflammatory or trolling (really big negative value). Also, producing valuable posts requires a lot of cognitive effort, where as low-value posts can (and are) churned out en masse by bots; so if we reward people just for creating posts, the system will get swamped. For this reason, minting cred directly to posts is an incentive vulnerability.

Our community has stayed healthy in the presence of this vulnerability, because the community is small and cohesive, and because we’ve had active regulation at the social layer of the project. However, as the project grows, we need to patch this vulnerability. More immediately: we’re discussing rolling out SourceCred to another community that does have contributors that produce a flurry of low-effort activity that don’t appear to add any value to their discussions. For SourceCred to function well for that community, we need to do better.

Minting cred directly to likes clearly has its issues. From first principles, the idea that a like is intrinsically valuable is at least as absurd as the idea that a post is intrinsically valuable. However, it makes the vulnerability less exploitable: now you need two actors who are actively collaborating, rather than a single actor. And it functions better as a heuristic: there’s a meaningful correlation between the # of likes a post receives and its value.

As a do-it-yourself experiment to compare and contrast the value of posts in general vs highly liked posts, go to your favorite YouTube video or Reddit thread, set the comments to “newest first”, and assess the average value. Now change it to “highly liked/upvoted” and assess the average value. I think it will be clear which heuristic works better.

That said: this is also a temporary solution. In the long term, we won’t be minting based on such crude signals. In the future, I’d like to move almost all of our minting towards either boosting, and/or directly minting towards community values and artifacts as decided by that community’s governance.

Yeah, part of the problem is that people are incentivized to add more posts, but not incentivized to curate / summarize / organize the discussion. I hope boosting would help here too.

Hmm, not sure about this. Under the proposed system, everyone’s like is equally valuable, so the “cred elite” would not have a stranglehold on the production of cred. Unless you take “cred elite” to really mean “the whole community”, and clearly the community should have the ability to gatekeep (so as to maintain its own culture, intentions, and identity).

In past discussions, we’ve agreed not to use individual contributors as examples of people earning too much or too little cred. It tends to make the discussion personal, and adds more heat than light.

I think that bringing in specific identity groups as examples tends to have the same effect. It heats up the discussion, and increases the stakes of disagreement or misinterpretation, without actually making the underlying issue any clearer.

What do you think?

The problem in that situation is that people’s efforts were going unnoticed by the community. This isn’t something the algorithm can solve. Minting some cred directly to them for showing up (but where they still get 0 social feedback or engagement) won’t fix things.

Agreed. It’s a different domain. Minting on likes (to catch quality issues, etc) and minting on merged PRs might be a better heuristic.

s_ben · January 28, 2020, 3:48am

Thanks for the extra explanation. Was convinced like-minting was the right direction, but just wanted to raise some concerns while we’re still at this early design phase. More confident now in the general direction

Open to such a rule if it becomes evident we have a problem with this. However, not being able to talk about specific identity groups could inhibit open discussion around those groups, such as their marginalization. I realize this can be problematic, especially if you’re not a member of the group you’re talking about. And perhaps my using this example was unnecessary. I think the current code of conduct (which appears close to merging) provides good guidance here. Have also been thinking that, at least for groups that aren’t too big, confidential reporting seems like a good option to have. I welcome any and all critique, privately or through confidential reporting.

Another thing I’m thinking about that potentially mitigates this is the alpha parameter, which can be tweaked to make cred more or less “leaky”. Presumably should collusion of any kind become a problem, making cred leakier could be a robust feedback mechanism.

Topic		Replies	Views
Sneak peek: SourceCred Discourse Plugin The CredSperiment	4	2228	August 19, 2019
Positive Cred Velocity Research	14	2906	December 11, 2019
Preliminary CredSperiment Cred The CredSperiment	20	3550	October 6, 2019
CredSperiment Progress Report The CredSperiment	11	2691	September 27, 2019
The Social Norms of Flowing Cred Questions (please answer!):sourcecred: Help	11	1976	June 16, 2020