These issues can sometimes be worked around by updating the list of blacklisted object IDs, but this is itself a high-friction and poorly documented process, and requires the user to invalidate their cache and re-download the whole project.
The net consequence of all these issues is that running SourceCred on arbitrary GitHub projects is a substandard experience which creates a lot of frustration for users.
However, thanks to the object blacklisting system, none of these issues block SourceCred’s dogfooding, so it’s been easy for us to ignore these problems.
Benefits:
We’ll stop frustrating early users
It will become possible to reliably run SourceCred on large repositories
This issue has been an active pain point for at least a year, and has been near the top of my mind that whole time. However, I have not seriously prioritized it, and am still not prioritizing it. Basically, I want to focus on making the CredSperiment a big success (see: deep before wide).
You can think of prioritizing the CredSperiment over external users as being like putting on our oxygen mask before others’. Basically, for any feature within SourceCred, having a high-performing and well-rewarded core community around SourceCred will make that feature easier to ship, because we’ll have more resources. Hence, I’m focused on features that improve SourceCred’s internal ability to coordinate and execute.
That said, I do think this is a very important issue, and if anyone else feels called to start working on it, I encourage them. As soon as I get the ability to put my cred where my mouth is via initiative bounties, I’ll give this one a hefty cred bounty.
Yes! While I realize why this is lower priority than other things, I do badly want to run this on Bitcoin, ethereum-go, and some other big repos that SC is currently choking on. Creating those scores, cool looking data viz, could be very interesting to the respective communities, and probably make the rounds on social media/crypto outlets, start some interesting discussions.
The completion of the fidelity awareness initiative (already listed
as a dep) resolves half of the listed failure modes: “invariant
violation on missing Reaction author”, “Failures due to entity
typename changes”, “Other contract violations in the API”. The set
of blacklisted IDs is no longer used at all (but the blacklisting
infrastructure still exists in case we need a quick stopgap later).
The “Maximum call stack size exceeded” bug is fixed by
#1684, which I just merged.
So, the two remaining items are GitHub rate limiting and node deletions.
I hope that #1687 (“Don’t fetch GitHub Commit parents”) will
have a significant impact on both rate limiting and load times, as (from
qualitative experience) this seems to come up most often on repositories
with long chains of commits none of which was merged by a pull request.
Proper node deletion handling is still an issue (as linked in the OP).