Visualizing the SourceCred graph

Not sure where to put this so I’m just creating a new thread. If this is better as a comment elsewhere feel free to move it or let me know and I’ll delete and repost.

Came across this tweet visualizing the progress of the PRs on the Prysm project (Eth2.0). Something similar for SourceCred would be really cool where someone could, if they wanted, playback the history of a project and see all the connections and how cred was earned. Obviously that’s a nice to have vs core feature, but it really does help to make the data come alive in a clear and intuitive way :slight_smile:

So essentially, like this demo, but with a timeline feature. Maybe first displaying the simple historical chart, and then giving users the option to view it more dynamically? If that could be integrated into the homepage of the website it would make the value prop of SourceCred immediately intuitive for so many people.

If that’s too complex or requires too much computation, maybe just take snapshots at regular intervals (like this) that the user could cycle through to see progress on project development and the flow of cred/grain in response?

1 Like

I think visualizing the graph would be helpful in a lot of places.

One that I’m imagining is in the new Explorer UI, showing the neighbor nodes in a graphic and being able to use them for navigation.

For ones that visualize the entire graph, I would experiment with existing tools to see how it turns out. My gut says it will be pretty tangled and have a lot of nodes, so it may not be very telling by shape. Giving the nodes sizes based on their cred score might be really interesting though.

1 Like

I’ve got a fair amount of experience with graph visualizers (worked alongside the team that made the TensorBoard graph visualizer). The SourceCred graph is absolutely enormous and making sense of it “raw” (or even rendering it) will be impossible. That means if we want to use a graph visualizer, we need either:

  • To aggressively filter the set of nodes in scope, e.g. show only nodes 1-degree away from a target node. However, it’s easy for a user to be connected to 1k+ nodes, which means using this technique on users (the most interesting nodes) is already a non-starter
  • To find some way to “compress” the graph, extracting only salient information

As an example of graph “compression”: maybe we could find a way to collapse the graph down to supernodes and users, while still maintaining meaningful edges. E.g. the “Discourse Artifact” supernode may not be directly connected to @beanow, but it may have a lot of cred paths @beanow. Need to think about the math and probably talk to @mzargham, but I suspect we could do this graph collapse by designating every supernode as a seed, seeing which users cred flows to, and then normalizing by the node’s own cred. (Might need to do this once per time interval, though.)

If that technique worked and we could collapse down to a faithful supernode-and-user map, then we could drive a really interesting graph explorer that represents content at a level of abstraction that’s intelligible to users.

(Note this would only be meaningful for projects that made extensive use of the supernode system… if most of the cred was flowing based on activity, it wouldn’t produce meaningful results.)

We could use a similar technique to do a user->user reduced cred map, which would be great for discovering collaboration patterns (or cliques!).

1 Like

Having trouble wrapping my head around “normalizing by the node’s own cred”, but I’m new to graph theory, so wouldn’t worry too much about that:) This does bring up an idea that’s been rattling around my brain lately though, which is normalizing by repo/maintainer. Basically, SC does a surprisingly good job “out of the box”, with default parameters. As @Beanowm, myself, and now @burrrata have discovered, it’s really fun and insightful for repos you already know. However, SC is not good (in my experience) comparing activity across repos. A mainly front-end repo with frequent changes, for instance, will generate a lot more cred than a repo containing blockchain consensus code, which hopefully doesn’t change that much actually, and the number of changes is not necessarily indicative of the significance/value of the change.

However, within a repo, it is fairly obvious to spot the “core contributors” (i.e. full-time devs adding lots of value). This seems like an obvious point of reference. Indeed one that projects tend of focus on (e.g. “we just need more of person X!”). What if we can normalize by that?

Generalizing this point, perhaps the question is, “from the perspective of a user, what are the meaningful categorizations they typically apply?”. Off the top of my head, my first questions would be, “who is actually working on this?”, “are they meaningful to me in other contexts (e.g. am I potentially going to work with them in the future on something else)?”, “how does this relate to other initiatives, particularly those I might be paid to work on?”. It might also be cool to see some general abstract overviews, potentially beautiful ones created by artists working with data.

1 Like

Potentially related work:

https://sourcecred.io/odyssey-hackathon/

The “SourceCred Explorer” (currently out of order)

From A Gentle Introduction to Cred

If we expand a single node, we can see how that node received its cred via its connections to other nodes. At the top level, it aggregates groups of connections based on the type of edge, and the type of node the edge connects to. The percentages show what fraction of the node’s cred came from that connection, and the numbers show how much total cred came from that connection.

Then, diving down within a particular group of connections, we can see all of the individual edges along with how much cred they contributed.

If we want to learn more about a particular edge, we can expand it to see the node that edge connects to. This gives us the ability to dive into the graph from a fresh starting point. As you go “deeper” in your exploration of the graph, the color becomes deeper as well.

A lot of this needs to be curated in new up-to-date issues. I think a lot of the thinking from the period you are citing remains relevant but there were simply other priorities at the time. I remain interested in the visualization work-stream but I do not have a lot of time to dedicate to it at the moment.

Probably the most help I can be is in co-mapping out an up-to-do visualization initiative but being unable to champion it personally, my offer is to support someone who is interested in taking it on.