Status
Completed
Champion
Initiative description
This is a proposed change to the GraphQL Mirror API and implementation to make it more robust to incorrect and inconsistent query responses sent to us by the GitHub API.
Each field in a GraphQL schema has a declared return type. For example,
the user
field on the top-level query type of the GitHub API returns
User
, and the user
field on the Reaction
object type also returns
User
.
However, sometimes GitHub returns IDs for objects whose actual types do not match the types declared by the specification.
Example
For instance, the following query reports that the given reaction was
created by a User
with a specific ID…
{
node(id: "MDg6UmVhY3Rpb24yMTY3ODkyNQ==") {
__typename
... on Reaction {
user {
__typename
id
login
}
}
}
}
{
"data": {
"node": {
"__typename": "Reaction",
"user": {
"__typename": "User",
"id": "MDEyOk9yZ2FuaXphdGlvbjE3OTUyOTI1",
"login": "webineh"
}
}
}
}
…but if we then ask for more properties about that User
we are
surprised to see that it is actually an Organization
!
{
node(id: "MDEyOk9yZ2FuaXphdGlvbjE3OTUyOTI1") {
__typename
id
}
}
{
"data": {
"node": {
"__typename": "Organization",
"id": "MDEyOk9yZ2FuaXphdGlvbjE3OTUyOTI1"
}
}
}
This throws our loading pipeline for a loop, because we’ve received
inconsistent data from the server. Currently, we work around this by
manually blacklisting IDs of objects for which we’ve observed this
problem (e.g., the reaction ID MDg6UmVhY3Rpb24yMTY3ODkyNQ==
in the
example above). Not only is this tedious, but it also means that we
don’t properly count cred from those contributions and contributors: we
ignore them entirely.
The purpose of this initiative is to make the GraphQL mirror structurally robust to these kinds of errors, such that we do not need to manually blacklist nodes. We should be able to tell the mirror that certain fields (or all fields) are “untrustworthy”, and that it should expect that results may be inconsistent and should take appropriate precautions.
Benefits
After this initiative is completed:
- SourceCred will be able to confidently load repositories without manual intervention or blacklist updating.
- SourceCred will be able to properly compute cred for users who have converted their accounts to organizations, and for bots like Greenkeeper that make automated pull requests.
Note that the first point is especially important for new users to SourceCred. If your first experience after hearing about this exciting project is, “cool, let me try it on my repository—wait, it’s just broken, and I have to send a pull request upstream with some weird IDs to fix it”, then there’s a fairly high chance that you just leave.
Implementation plan
GitHub issue #998 contains discussions of high-level system designs and a recommended option in the issue body, lower-level API designs in a comment, and a step-by-step implementation plan in a subsequent comment. These are fairly detailed; please refer to them for more information.
Estimated work (hours)
Around 20–40 focused hours for implementation. I’ve already spent at least 6–12 hours considering the problem space and solution space (including some helpful discussions with @decentralion), which led to the above design and implementation plans.
As with the GraphQL Mirror EAV refactor, it is critical that this work be done with careful attention to correctness, because it loads key data into a persistent store.
Dependencies
None, other than the creation of the Mirror module itself. The other current GraphQL Mirror initiative (the EAV refactor) is basically entirely unrelated, because it only covers primitive fields and fidelity only matters for non-primitive fields.
References
GitHub issues about this initiative:
- Main tracking issue: Make Mirror robust to input data with
incorrect
__typename
s (#998) - Catalog of upstream offenders: Known contract violation errors in the GitHub GraphQL API (#996)
Issues and pull requests that would not need to exist in a world where this were implemented (many from external contributors—not the best first experience):
- https://github.com/sourcecred/sourcecred/issues/1023
- https://github.com/sourcecred/sourcecred/issues/1353
- https://github.com/sourcecred/sourcecred/issues/1362
- Most changes to
blacklistedObjectIds.js
Contributions
From this initiative:
- Pre-work: #1609, #1610, #1612, #1615, #1660
- Primary work: #1662, #1663, #1664, #1665, #1666, #1667, #1668, #1669
Previously, contributor @wpank began a sequence of pull requests to address this issue, but these were never merged and have since gone stale.