SourceCred

GraphQL Mirror fidelity awareness

Initiative description

This is a proposed change to the GraphQL Mirror API and implementation to make it more robust to incorrect and inconsistent query responses sent to us by the GitHub API.

Each field in a GraphQL schema has a declared return type. For example, the user field on the top-level query type of the GitHub API returns User, and the user field on the Reaction object type also returns User.

However, sometimes GitHub returns IDs for objects whose actual types do not match the types declared by the specification.

Example

For instance, the following query reports that the given reaction was created by a User with a specific ID…

{
  node(id: "MDg6UmVhY3Rpb24yMTY3ODkyNQ==") {
    __typename
    ... on Reaction {
      user {
        __typename
        id
        login
      }
    }
  }
}
{
  "data": {
    "node": {
      "__typename": "Reaction",
      "user": {
        "__typename": "User",
        "id": "MDEyOk9yZ2FuaXphdGlvbjE3OTUyOTI1",
        "login": "webineh"
      }
    }
  }
}

…but if we then ask for more properties about that User we are surprised to see that it is actually an Organization!

{
  node(id: "MDEyOk9yZ2FuaXphdGlvbjE3OTUyOTI1") {
    __typename
    id
  }
}
{
  "data": {
    "node": {
      "__typename": "Organization",
      "id": "MDEyOk9yZ2FuaXphdGlvbjE3OTUyOTI1"
    }
  }
}

This throws our loading pipeline for a loop, because we’ve received inconsistent data from the server. Currently, we work around this by manually blacklisting IDs of objects for which we’ve observed this problem (e.g., the reaction ID MDg6UmVhY3Rpb24yMTY3ODkyNQ== in the example above). Not only is this tedious, but it also means that we don’t properly count cred from those contributions and contributors: we ignore them entirely.

The purpose of this initiative is to make the GraphQL mirror structurally robust to these kinds of errors, such that we do not need to manually blacklist nodes. We should be able to tell the mirror that certain fields (or all fields) are “untrustworthy”, and that it should expect that results may be inconsistent and should take appropriate precautions.

Benefits

After this initiative is completed:

  • SourceCred will be able to confidently load repositories without manual intervention or blacklist updating.
  • SourceCred will be able to properly compute cred for users who have converted their accounts to organizations, and for bots like Greenkeeper that make automated pull requests.

Note that the first point is especially important for new users to SourceCred. If your first experience after hearing about this exciting project is, “cool, let me try it on my repository—wait, it’s just broken, and I have to send a pull request upstream with some weird IDs to fix it”, then there’s a fairly high chance that you just leave.

Implementation plan

GitHub issue #998 contains discussions of high-level system designs and a recommended option in the issue body, lower-level API designs in a comment, and a step-by-step implementation plan in a subsequent comment. These are fairly detailed; please refer to them for more information.

Estimated work (hours)

Around 20–40 focused hours for implementation. I’ve already spent at least 6–12 hours considering the problem space and solution space (including some helpful discussions with @decentralion), which led to the above design and implementation plans.

As with the GraphQL Mirror EAV refactor, it is critical that this work be done with careful attention to correctness, because it loads key data into a persistent store.

Dependencies

None, other than the creation of the Mirror module itself. The other current GraphQL Mirror initiative (the EAV refactor) is basically entirely unrelated, because it only covers primitive fields and fidelity only matters for non-primitive fields.

References

GitHub issues about this initiative:

Issues and pull requests that would not need to exist in a world where this were implemented (many from external contributors—not the best first experience):

Links to contributions

Previously, contributor @wpank began a sequence of pull requests to address this issue, but these were never merged and have since gone stale.

No active work yet.

2 Likes

Thank you for the detailed proposal, @wchargin. I’ve updated Make the GitHub Plugin Robust to point to this as a dependency.

Once we have Cred Bounties enabled, I propose we fund this with a large bounty.