Graph Mappings vs Graph Filtering

Motivation

During SourceCred office hours (Office Hours Agenda: 5/2), one of the discussion threads with @decentralion and @Brutalfluffy was the question of how to produce views of the SourceCred graph that met certain user experience requirements. The initial observation was that 'Filtering" a graph in order to ‘zoom out’ or ‘focus’ on a specific type of node (e.g. contributors) fails to preserve the network structure.

Observing that filtering does not meet our needs, this post presents the concept of Graph Mappings as an alternative to filtering data in order to produce graph objects which are at once consistent with a more detailed graph but reduced in scope to make them easier to explore as humans.

Preliminaries

Consider a SourceCred Graph G={V,E} where V is the set of nodes and E is a subset of V x V is the set of edges. We will further assume at each node i in V contains metadata and specific data about the object i; for now simply refer to all data about node i as D_i. Furthermore, for any edge representing a connection between two nodes i and j, there is a link e=(i,j) pointing from node i to node j and there is data D_e about the edge.

The data about nodes an edges can be types such as user, commit, comment etc; it also includes what plugin generated the node or edge as well any other information that is retrievable about the object. For the purpose of this discussion, we will define a graph filtering as a mapping that simple uses data to decide whether a particular node or edge is including the output of a transformation.

Graph Filtering

Consider F: G --> G_F where G_F = {V_F, E_F} is the filtered version of G, and the filter logic F operates on the nodes computing for each i in V whether it belongs in the node set V_F based on the data found in D_i. The filtering logic simple includes our excludes each node from the resulting graph. The problem with the filtering logic is that the resulting edge set E_F is necessarily a subset of V_F x V_F and a lot of important structural information is lost.

Graph Mapping

Consider M:G --> G_M where G_M = {V_M, E_M} is the mapped version of , and the mapping logic M operates on the nodes computing for each i in V whether it belongs in the node set V_F based on the data found in D_i. Unlike the filtering, the mapping does not determine for each node whether to include it or exclude it but rather produces a hierarchy where nodes are assigned to groups.

For example, if we wanted to view the network from the perspective of contributors (users) we could collapse all non-user nodes into groups identified by the user the nodes. This mapping thus does not delete the non-user nodes it conceptually absorbs them into the user nodes which means that it is possible to infer edges between the grouped nodes from the edges between non-user nodes.

More technical details regarding how such mappings can be implemented and what impact choices of mapping on the perspectives generated will be explored in the https://github.com/sourcecred/research repo.

As next steps are fleshed out I will write a more technically detailed ‘Issue’ for graph mapping “infra” there.

follow up comments:

per @decentralion suggestion, I am going to shift this thread toward practical examples which will i plan to accomplish via EDA in this branch https://github.com/sourcecred/research/tree/graph-mapper-prototype.

I started with a graph filtering by type which is clearly disconnected; next i’m prototyping an example of graph mapping.