First off–@ryanMorton if you haven’t had a chance to play with the UI we made at the hackathon, check it out here. I used your work as a starting point, so you definitely have cred in that product. data:image/s3,"s3://crabby-images/4622e/4622e663bccd5ef985f3c8d3d442515fb3203925" alt=":slight_smile: :slight_smile:"
Personally, I’m not sure if we should be trying to display the whole graph. I’m worried it will just be a noisy mess trying to put so much information in one screen. My intention was that we would keep building up this subgraph viewer, with the assumption that at any time, <100 nodes are in scope. The user will choose some nodes that they are interested in, which anchor the scope, and then we use a score-aware graph traversal algorithm to fill out the remaining nodes.
For example, suppose we are building a graph-based workflow for answering the question, “where did this node get cred from”? Then the user chooses the node they are interested in, and we find the paths to that node that contributed the most score, e.g. using @mzargham’s algorithm here. Using that, we find the 99 most-relevant nodes, and display those. Then the user can double click on a new node to select it, which results in a new group of nodes coming into scope.
This won’t be a super good fit for our current graph, where the most interesting nodes (users) characteristically have enormously high degree. So if I have my cred split across 900 pull requests I authored, but we’re only willing to display the top 100 nodes, then we’re going to miss a lot of the picture. We could think about doing some aggregation like we do on the current prototype though it’s not clear how this would work in a graph layout.
Maybe this UI will wind up being most useful for inspecting low-degree nodes (e.g. a particular pull request) but not inspecting users. (BTW, I expect a major use case for this UI will be manually adding new nodes/edges to the graph.) This could be problematic if as we collect more data, nodes tend to become higher and higher degree, so we may need a way to do “graph compression” or to otherwise collapse nodes/edges together. (E.g. could we imagine collapsing a pull request and all of its comments into one node, while maintaining the right cred-flow properties? cc/ @mzargham.)
Also, @mzargham and I have had some discussion in the past about finding ways to make users not so high degree; e.g. my user node has a connections to every month-long period that I was active in the project, and then those user-period nodes are connected to all the contributions from that time period.
From an implementation standpoint: for complex visualizations like this I believe it’s very important to have unit testing, otherwise it’ll become really hard to maintain as we keep adding features to it. We’ll also want to have a good API for interacting with the visualization so that we can try to build UIs on top of the graph rather than in the graph. Which implies finding a good way to fit the graph into React’s state and props abstractions. I’ll need to do some research and find a good way to do this, the prototype code from the hackathon isn’t really maintainable in that sense, and has some gross state contamination between React and D3. Probably for now it’s best if you focus on prototyping out algorithms and visualizations, and I worry about productionizing them into a form that we can ship and maintain.