Make the GitHub Plugin Robust

Make the GitHub Plugin Robust

Status: proposal

Champion:

Initiative Description:

Currently, the GitHub plugin is quite fragile. When loading a large repository, it is quite likely to fail in a variety of ways, for example:

These issues can sometimes be worked around by updating the list of blacklisted object IDs, but this is itself a high-friction and poorly documented process, and requires the user to invalidate their cache and re-download the whole project.

The net consequence of all these issues is that running SourceCred on arbitrary GitHub projects is a substandard experience which creates a lot of frustration for users.

However, thanks to the object blacklisting system, none of these issues block SourceCred’s dogfooding, so it’s been easy for us to ignore these problems.

Benefits:

  • We’ll stop frustrating early users
  • It will become possible to reliably run SourceCred on large repositories

Implementation plan:

TBD

Estimated Work (hours):

40-80? I think it’s a lot of work.

Dependencies:

References:

Contributions:

TODO (add contributions that already apply, and new ones)

1 Like

This issue has been an active pain point for at least a year, and has been near the top of my mind that whole time. However, I have not seriously prioritized it, and am still not prioritizing it. Basically, I want to focus on making the CredSperiment a big success (see: deep before wide).

You can think of prioritizing the CredSperiment over external users as being like putting on our oxygen mask before others’. Basically, for any feature within SourceCred, having a high-performing and well-rewarded core community around SourceCred will make that feature easier to ship, because we’ll have more resources. Hence, I’m focused on features that improve SourceCred’s internal ability to coordinate and execute.

That said, I do think this is a very important issue, and if anyone else feels called to start working on it, I encourage them. As soon as I get the ability to put my cred where my mouth is via initiative bounties, I’ll give this one a hefty cred bounty.

1 Like

Yes! While I realize why this is lower priority than other things, I do badly want to run this on Bitcoin, ethereum-go, and some other big repos that SC is currently choking on. Creating those scores, cool looking data viz, could be very interesting to the respective communities, and probably make the rounds on social media/crypto outlets, start some interesting discussions.

1 Like

Quick status update:

  • The completion of the fidelity awareness initiative (already listed as a dep) resolves half of the listed failure modes: “invariant violation on missing Reaction author”, “Failures due to entity typename changes”, “Other contract violations in the API”. The set of blacklisted IDs is no longer used at all (but the blacklisting infrastructure still exists in case we need a quick stopgap later).
  • The “Maximum call stack size exceeded” bug is fixed by #1684, which I just merged.

So, the two remaining items are GitHub rate limiting and node deletions. I hope that #1687 (“Don’t fetch GitHub Commit parents”) will have a significant impact on both rate limiting and load times, as (from qualitative experience) this seems to come up most often on repositories with long chains of commits none of which was merged by a pull request. Proper node deletion handling is still an issue (as linked in the OP).

2 Likes