SourceCred

Wasabi Wallet Documentation

What’s up peers!

I am contributor and maintainer of the Wasabi Wallet documentation, an archive of knowledge of the nuances of Bitcoin privacy, and how to use Wasabi properly for self defense. Wasabi Wallet is an unfairly private Bitcoin wallet, with several cutting edge features under the hood, including built-in ZeroLink CoinJoin.

As you might be aware, just recently we tried a contribution game with the main Wasabi repository, and the results were very positive! Here we got introduced to your project, and we think that SourceCred might be exactly the tool we are looking for!

In the mean time, we started work on the Wasabi docs, as we realize how important the proper education of users is. The zkSnacks company decided to sponsor a monthly budget to pay for the maintenance of the docs repo. I instantly thought about SourceCred to use as a tool to figure out the quantity and quality of the contributions, thus I’m opening this thread here to start the conversation!

A couple things that might be relevant here:

  • It is a libre & open source documentation under the MIT license.
  • Already there are 9 contributors, hopefully growing soon.
  • I am the maintainer, and thus merge most PRs.
  • The website is built with VuePress, the integration is pretty much finished with only fine tuning left to do.
  • Most of the future commits should be in the markdown files with the content.
  • We have a writing convention that every sentence starts a new line, this might help your algo a lot.
  • Some of the content is imported from already existing articles.

So, I’m really curious on how we can setup and tweak SourceCred to do its magic for the docs repo. I’d very much appreciate your insights in how we can fine tune your tool so that it is fitting given the criteria of the documentation. I’m looking forward to using your tool, and collaborating with you to make it work properly!

Much thanks Max

1 Like

I could not add more than two links, so in this folow up:

Hi @max, welcome to the SourceCred discourse! Super glad to have you here.

First off, nice work on the Wasabi Wallet docs. I spent a while reading it this morning and learned quite a bit about privacy in Bitcoin. I hope that as SourceCred grows it will attract the help of people as talented as yourself to help document it. :wink:

On to the matter at hand; calibrating SourceCred for use on WasabiDocs. I’ll note that @s_ben has written some great thoughts here: SourceCred for Documentation.

First, let’s take a look at how well SourceCred works on WasabiDocs out-of-the-box. Here’s the cred on zkSNACKs/WasabiDoc with the default settings:

Before digging further, I’m curious what you think of these scores. Do they seem mostly right? Do you think certain contributors seem under- or over- valued?

One of the things we talked about in our Twitter chat (and @s_ben brings up in his thread) is using # of lines changed as a heuristic for setting the weight. So I made a prototype that has this behavior so we can see the effects.

Let me briefly explain the concept of “node weight” in SourceCred. In the current implementation, a node’s weight has two effects: it increases the total absolute amount of cred in a period, and it makes cred “start flowing” at that node, to then propagate outwards according to the PageRank algorithm and the structure of the graph. By default, every GitHub contribution gets a weight as follows:

SourceCred also supports manually setting the weights on individual nodes. So, to come up with a lines-of-code experimental branch, I set the default node weight for every non-pull request to 0, and then set a manual weight for each pull request according to the following simple algorithm:

function scoreFor(p: R.Pull) {
  if (p.mergedAs() == null) {
    // no cred for unmerged code.
    return 0;
  }
  return p.additions();
}

We might want a clever-er algorithm. For example, we might want to consider just net additions, or to modify the score based on what %-age of the lines changed were net additions. But for this experiment I just went with the simplest thing.

Here is an instance using this weight algorithm:

There’s one very big difference, which is that the absolute cred scores have changed a lot – now the sum of all scores is about equal to the sum of lines of code changed. (It will be off by a bit because there’s a time decay factor, so for recent pulls not all the cred has been “issued” yet.) However, the relative rankings are pretty similar. They’re similar because the underlying graph structure is the same, and the structure of the graph has a great deal of influence on the scores.

There’s a parameter that lets us influence this: alpha, which basically determines how strongly cred returns to the seed nodes (in this case, pull requests) vs. how easily it flows across the graph. You can think of increasing alpha as increasing the “stickiness” of cred to whatever its source is.

The default UI doesn’t yet expose alpha, but I added it in for this experiment. So in the instance linked above, you can set alpha to a higher value and then click “recompute cred”. For example, if we use alpha=0.5:

image

Now with alpha=0.5, # of lines changed has a much bigger effect on the distribution, and the cred shifts a lot. Now its you, dennis, and thunderBiscuit that have almost all of the cred; others like michaelToth have basically fallen off.

If we wanted to, we could also configure SourceCred so that every person’s score is simply how many lines of code they authored. Right now, if (e.g.) dennis authors a huge PR, not all of that cred will stay with him. Some will go to him, some will flow to the reviewers or people who commented on it. Of the cred that goes to him, some will flow out from his other interactions; e.g. if he gives a thumbs up to someone else’s pull, then that pull will transitively get some cred from Dennis’s giant pull.

If we want, we can turn this behavior off, by changing the weights so that authorship is the only thing that matters: image

This has a super dramatic effect on the output cred. Now dennis has the most cred:

image

My example of Dennis having a huge PR was not hypothetical. :slight_smile: The reason that Dennis leads when we only consider authorship is because of this pull, which added 10k lines: https://github.com/zkSNACKs/WasabiDoc/pull/26

That pull references issue #2 (written by you) and mentions you and thunderbiscuit by name, so with the default settings, the cred for this giant PR smooths out over several engaged contributors. With the pure-lines-of-code metric, it all just accrues to Dennis. I think this is a good example of the robustness that SourceCred gets from using PageRank algorithm rather than just counting. It’s also a good use case for SourceCred’s ability to set manual/override weights on individual pieces of content.

I’m planning to run my own “contribution game” for SourceCred itself (see: Dogfooding SourceCred). For that experiment, I’m planning to review the weights of important contributions, to provide a bit of a manual assist to SourceCred. Some of the same considerations in that thread will apply here, e.g.: will you get paid from the contribution game?

I’ll stop here, curious to get your thoughts!

(Also: I used my moderator tools to boost you out of “new user” status, so you shouldn’t have any more issues re: number of links you can post.)

2 Likes

:eyes: I was expecting a couple insights in the tool - yet not such an awesome answer with already several working prototypes to specifically this project. Wow - this is fantastic - thank you very much!!!

I am impressed by the gut-feeling accuracy of the defaults, but I think some fine tuning is needed.

In regards to the huge PR by Dennis - this has many lines of code, because it introduces a bunch of config files and such. So this might skew the result quite heavily. However, in this case, this PR was of MASSIVE value to me and the doc in general, so I think it might not even be accounted for high enough.

I would have expected that Dennis has about 1.5x the creds compared to Thunder. But I might have this intuition also based on our private DMs, and these are of course not considered here.

However, I do think that now that we have the vuepress setup, most of the commits should be only content, which would might level everything out. This also means that this past performance is not indicative of how the future contributions will look like…

I’m really intrigued by this alpha calculation - and especially that it can be directed vie the reactions, this is a good addition!

It makes a lot of sense to have this percentage of the line changed as well. Because here a typo fix will be regarded differently, and I have already a bunch of commits that change only one letter…

Wow - really - I am VERY impressed by this, thank you again for helping out with such enthusiasm!!

What do you suggest as next steps? What other information do you need from me? How would we integrate source cred in the wasabi doc, do we need to install anything in the repo, how is this hosted?

1 Like

From a hosting perspective, it’s all very simple. Basically, the SourceCred frontend is all self-contained; it just needs any HTTP server to provide the webapp along with some static JSON, but there’s no back-end. Which means that you can host a SourceCred instance anywhere you like; I personally tend to host instances on GitHub pages.

I would recommend making a new repository like zkSNACKs/WasabiDocCred, and set it up so that it hosts on GitHub pages. Then we can add an update script to that repo, and whenever the script is invoked, it will run SourceCred on zkSNACKs/WasabiDoc, and save the resultant frontend into the WasabiDocCred repo. Then you (or whoever is running it) can review it, commit it, and push it to GitHub pages. (Or you can just connect it to a cron job and have it automatically update every day; I think that @Beanow did something like that for SFOSC’s cred.)

A simple implementation of the script could be that it sets up SourceCred as a submodule, runs yarn to install the deps, and then calls ./sourcecred/scripts/build_static_site.sh --target . --project zkSNACKs/WasabiDoc. Though we’d want to add a bit more complexity to get stuff like caching of the data from GitHub… I’ll be happy to help set it up.

SourceCred also has a CLI command for getting the scores programattically. So at the end of the month you could use that to get the exact scores so as to divide up the bitcoin according to cred.

There’s also the issue of getting the lines of code weights integrated. It sounds like using the LOC weighting is a pretty big improvement from your perspective, so we should make sure it’s included in your instance. For the prototype in this thread, I implemented line of code weighting in a hacky way. It works, but before merging into master I will want to refactor it into a more maintainable solution. That said, there’s nothing stopping you from using the same prototype for now. It works, it will just add a bit more complexity to the update script; and we could easily migrate off the prototype and onto master once I merge official support.

But overall, I’d say: if you want to set up SourceCred on WasabiDoc tomorrow, we could make it happen. :smile: Let me know what your timeline is. Also, you can spend some time playing with the weights on any of the instances above, if you find settings you like we can canoncialize them for your project.

1 Like