Cred Analysis Notebooks

Cred Analysis Notebooks

Status: help wanted

Occasionally new types of Notebooks are created, but there’s room for a lot more. Additionally it’s a great starting point for new contributors. Please give a shoutout if you’d like to help.

Champion:

None atm. It would be very helpful if a Champion could oversee the quality, organization and prioritized wishlist of Notebooks.

Initiative Description:

Observable Notebooks have proven extremely useful tools for building new interfaces, experiences, and workflows on top of SourceCred. For example, look at the payout notebooks (week 1, week 2).

However, I believe we’ve only scratched the surface of what we could do with notebooks. We could use them to build whole new explorations or deep dives into SourceCred scores themselves. Here are a few concrete examples:

  • Person<->Person cred flow analysis: For each user in SourceCred, run SourceCred treating that person as the “seed node”. See how the cred distributes across other users. This will show which people are collaborating and endorsing each other, and whether some people are acting as “cred sinks” or “cred black holes”.

  • Initiative-scoped cred: For a given initiative, treat that initiative as a seed, and see who has cred in the initiative.

  • Reverse-initiative scoped cred: For any given person, back out which initiatives give them cred.

Benefits:

We will be able to iterate on and share SourceCred research and analysis dramatically faster. Right now, our iteration cycle is constrained by updating the core SourceCred codebase (slow and expensive). We can move R&D iteration into notebooks (fast and cheap).

Implementation plan:

  • Once we make it possible to depend on the SourceCred core from notebooks (see below), we should be able to start experimenting and making these notebooks.

Alternative

  • Use the needs of Notebook to flesh out a library as we go. Having utility functions, sourcecred core abstractions, etc.

Estimated Work (hours):

Probably a day per notebook. 1-2 weeks to create a versatile library as the basis for experimentation.

Dependencies:

References:

Wishlist

Contributions:

2 Likes

@decentralion should be a wiki :smiley_cat:

I’ve added different kinds of notebooks as a contribution towards this. And would like to collect notebook ideas as “wishlist” references.

More importantly, I think this is a good place to get started, so I added the help wanted status and explanation.

Also added an alternative implementation route, of developing a library as we create more notebooks. And adjusted estimated work to accommodate a bit for less familiarity with the project internals.

1 Like

I’d like to help! Mostly because I’d like an entry point to deeply understand how the SourceCred protocol works, and also because making Cred Analysis Notebooks easier for people to use will help us design, model, document, and improve the CredSperiment.

Awesome! Really curious to explore this. Besides just diving in, is there a recommended way to get started on this? Also, if I spend a few days noodling around with the notebooks could we then hop on a call to discuss and explore questions?

Are these required before work on this Initiative can begin, or just before the Initiative can achieve it’s full potential?

Reading through the code and notebooks attempting to understand what’s going on. Since I always take notes, might as well keep track of them here in case that’s useful for anyone.

Payout Notebooks

Starting with these because they’re the most updated notebooks that the SourceCred community seems to have. Looks like the way the notebooks work is that someone runs the SoruceCred protocol over SourceCred’s GitHub (is it just the sourcecred repo, or all the repos?) and Discourse. Then that someone (currently @decentralion) updates the Cred repo with that week’s weights and scores. Then that data is put into an Observable notebook (like this one) by linking to the scores_file and the distribution_history_file.

So the notebooks in the SourceCred Contributor Payouts thread only compute the Grain payouts based on the Cred scores. If we wanted to create notebooks to calculate how Cred might fluctuate with the introduction of various mechanisms we would need an entirely different system because we would be modeling the Cred itself, not the Grain that is a result of the Cred.


Cred Repo

In the Cred repo weights file it looks like there’s some things that have been manually bumped 4X:

   "nodeManualWeights": {
      "N\u0000sourcecred\u0000discourse\u0000topic\u0000https://discourse.sourcecred.io\u0000248\u0000": 8,
      "N\u0000sourcecred\u0000discourse\u0000topic\u0000https://discourse.sourcecred.io\u0000269\u0000": 4,
      "N\u0000sourcecred\u0000discourse\u0000topic\u0000https://discourse.sourcecred.io\u0000270\u0000": 4,
      "N\u0000sourcecred\u0000discourse\u0000topic\u0000https://discourse.sourcecred.io\u0000291\u0000": 4,
      "N\u0000sourcecred\u0000discourse\u0000topic\u0000https://discourse.sourcecred.io\u0000327\u0000": 4
    },

According to this Issue and this commit these correlate to:

  • Produce the SourceCred Podcast: Shipped the podcast!
  • Discourse Reference Detection: Recently completed.
  • Champions & Heroes: An exciting contribution. :slight_smile:
  • Initiatives Plugin: Nice progress so far.
  • Community call notes

Would be great if we had code comments here to explain that, but JSON does not allow for comments right?


How is Cred computed?

A little confused because Cred is computed in the main SourceCred repo in the weightEvaluator file (so that you can download it and run it on arbitrary repos and forums), and it’s also computed in plugins like Initiatives (so that plugins define their own logic), and Cred is computed in the Cred repo (not sure why there’s weights here and in the main repo). Why is Cred computed in so many places, or am I just misinterpreting the codebase?


How does Cred flow?

Reading through pagerankGraph.js:

 * Every edge in the Graph is assigned an `EdgeWeight`, which includes a
 * `forwards` (weight from the `src` to the `dst`) and a `backwards`
 * (weight from the `dst` back to the `src`). Both `forwards` and
 * `backwards` must be nonnegative numbers. The weights influence how
 * score flows from node to node. For example, if the node `root` is
 * connected to `a` with a weight of `1` and to `b` with a weight of `2`,
 * then `b` will recieve twice as much score from `root` as `a` does.

Discussing how exactly this works in the Questions About Initiatives - #8 by burrrata thread. Will update this thread with answers once I understand how exactly the connections between nodes works.

Also from pagerankGraph.js it looks like SourceCred creates a graph, then runs a (modified?) PageRank algorithm to score the graph, then all the SourceCred specific processing happens afterwards to boost or modify those scores?

 * At present, PagerankGraph does not support any modification to the
 * underlying Graph; doing so will invalidate PagerankGraph and cause
 * its methods to throw errors.
 */

@burrrata:

First off, there’s a ton of context here which I’m happy to discuss on a call.

Regarding understanding the flow of data processing within SourceCred, it works like this:

The weightEvaluator is a small piece of logic within that broader orchestration. In general, I would say the api/load module is best place to look to understand the overall flow for generating cred.

The sourcecred/cred repository is the prototype cred instance or “cred repository” where we store the results for SourceCred itself. However, it doesn’t contain the logic for computing cred, except insofar as it has a small update script which invokes SourceCred command line tools.

In general, almost all of the processing happens when creating the graph, and then applying the algorithm to the graph. This process gives us timeline cred, for every week, every single node in the graph that existed as of that week has an associated cred score.

Overall Advice

Overall, I think the “help wanted” label on this initiative is mostly premature (I didn’t actually write that bit). Currently, it’s possible to access the cred data in a notebook, but it isn’t possible to inspect or modify the process that generates cred. In the future, it will be possible to load the graph for a given project into a notebook, but then drive the cred calculation process in the notebook, potentially changing the weights, parameters, or even tweaking how the cred algorithm works.

This is not possible yet because we can’t import SourceCred core modules into an Observable Notebook. To do this, we first need to Publish SourceCred on NPM. We will need an experienced JS developer to champion that before we can unlock the true potential of cred analysis notebooks.

However, there is still some interesting notebook work to be done; basically anything that we can do that depends on the cred scores but not on changing or re-computing the cred scores. One notebook that I started and haven’t finished is this participation power exploration. The idea is to formally explore participation power as a mathematical construct, looking at how the parameter choice and construction will affect things like the gini coefficient of power within SourceCred.

If you wanted to take on a notebook, that one could be a good place to start! You’ll definitely need to keep demonstrating independent sleuthing skills, and we could also have a call as a context dump where I share the “what, why, how, and alternatives considered” of that notebook.

Also, I recommend watching my old SourceCred code walkthrough as a way to get context on the SourceCred codebase. A number of the specific details have changed; however, the overall tools and organizational patterns have not.

1 Like