SourceCred

Summary of planning chat

Yesterday had a chat with @decentralion about things to work on next, mono-repo vs multi-repo structures, refactoring we need, etc. (Don’t know how long a discord link stays valid, but here it is)

Milestones and versioned releases

A few milestones we converged on:

  1. Adding initiatives.
  2. Urgent UI and infrastructure* improvements for the CredSperiment.
  3. Migrating to instances and a modular system design.

*= not physical (e.g. servers) infrastructure, but for example a solid CLI command to calculate payouts, instead of the current magic process of updating notebooks.

These are ordered by the larger planning guideline to:

  • First focus on CredSperiment and making SourceCred work well for it’s own community.
  • The move on to make SourceCred easier to use for other open-source projects and make it work well for them.

The instance system is a good example of making things easier for other projects, so it’s further down compared to direct improvements for CredSperiment.

We also felt like they make sense to mark releases based on that:

  • v0.5.0 = Initiatives release.
  • v0.6.0 = UI and infrastructure reworked.
  • v0.7.0 = (Modular) instances system.

I wouldn’t say these are set in stone (other than initiatives as I’m championing that :muscle:) , but perhaps a useful indicator.


Multi- vs mono-repo and modular design

For context:

There’s a need to split some of sourcecred/sourcecred's code into modules we can use elsewhere. For example some of it’s code has been copy-pasted or reproduced in sourcecred/widgets and sourcecred/payouts.

Conversely we have plugins, but they’re baked into the core software at sourcecred/sourcecred's src/plugins. Actually pulling those out of the core will force us to develop plugins the way a 3rd party developer would too (dogfooding our plugin system).

In this chat:

There’s a worry that multi-repo will cause overhead to keep consistent in terms of tooling, quality and synchronized releases. A tools repo could be a half-solution but doesn’t fully avoid this. Mono-repo on the other hand can lead to taking shortcuts and coupling modules, like we’re doing with the plugins now.

What we could agree on though is:

  • The “core” needs to be a module which we can depend on as a library.
    It can expose common utilities and types as well.
  • Core needs a serious diet.
    Taking the UI, the plugins, maybe the CLI as well, out of it into other modules.
  • Those non-core modules could be in multi-repo or mono-repo.

In particular would love to hear @wchargin’s take on this structure.

2 Likes

dogfooding our plugin system

Let me propose a similar but countervailing suggestion. Pull plugins out of core, at an interface level and a filesystem directory level, but keep them in the main repository for a while yet. Collocate any new plugins alongside the existing ones in the main repository. If this is all easy to do, excellent! We can migrate them all out of the repository without difficulty. If not—if we find that new plugins or changes to our current plugins are suggesting changes to the core APIs—then it will be a good thing that we held off on physically splitting them out. It will be good that we added the coupling that we did, because the alternative would have been adding backdoor “experimental only!” public APIs, doing a version bump, updating dependencies in the subproject, using the new hacky API, adding a half-hearted TODO to migrate off of it eventually, and more likely than not ending up with a long-standing blemish on the final product.

Our larger danger here is not that we might fail to extract enough. It is that we might settle onto the wrong abstraction. A plugin system is one of the heaviest-weight kinds of abstractions that you can get (second only, maybe, to code generation systems). This makes it especially hard to revise design decisions. The multi-stage “A/AB/B” compatibility windows and other overhead that multirepos enforce are great once you’re at the point that you care about compatibility. When you’re still exploring, they’re severely limiting.

Will Tracz popularized a “rule of threes”: write three implementations of an interface before solidifying it. If you write just one, you’re probably missing important points in the feature space. If you write two, you’ll have a better idea, but it may be difficult or awkward to adopt new implementations. If you write three, you’ll be fine. Where do we stand today? We have the GitHub and Discourse plugins, and also the identity plugin. The latter is quite different from the other two, in ways that reveal that we really don’t understand all the pieces yet.

As proof of feasibility: the src/graphql/ subtree has been carefully developed as an independent package. It has no dependencies whatsoever on the rest of SourceCred (core or plugins), and never has. Being in the same repository didn’t make this any harder. We just treated the package boundary as if it were a repository boundary, even though there was no such technical restriction. And the monorepo property has been helpful, as we’ve been able to make changes like PR #961 and PR #1337 atomically, to say nothing of developing the upcoming fidelity enhancements!

1 Like

+1 to what William said.

my 2c: the repository refactoring described here makes sense to me as a long term outcome (long term being >6 months in the future). For now, I don’t think there’s any urgent need to pull things out of sourcecred/sourcecred; in the case of sourcecred/payouts, I think by far the most efficient path is to simply collapse all the payout logic into sourcecred/payouts.

I do want to make it possible to depend on the code outside of sourcecred/sourcecred, i.e. make it possible to import (say) the Graph module, or logic for TimelineCred. This would be most valuable for Cred Analysis Notebooks. It would also enable a more robust interface/implementation for sourcecred/widgets. But achieving this doesn’t require changing the repository structure.

I think there are 3 “scopes” of SourceCred we can think about:

  • Scope 0: Only SourceCred uses SourceCred.
    • (where we are now)
    • APIs, interfaces, and workflows can be very messy
    • can break compat whenever
  • Scope 1: Other projects use SourceCred, mostly “off the shelf”
    • projects are turning on SourceCred instances that are very similar to our own usage: GitHub, possibly Discourse, Cred+Grain, etc
    • we need the SourceCred Instance System for usability
    • plugin interfaces are not mature, all plugins come from core
    • shouldn’t break compat on workflows, can break compat on interfaces
  • Scope 2: SourceCred as a protocol
    • Other projects are extensively using SourceCred APIs, possibly in ways that are fairly different from how SourceCred is dogfooding
    • Consumers are developing plugins for their own use case that are not expected to merge upstream
    • shouldn’t break backcompat on workflows or interfaces

The repository refactoring to improve / enforce the ability to add new plugins is something we only need at scope 2. However, even getting to scope 1 is months away, and going to multi-repo doesn’t help at all within Scope 1. So we should hold off engaging in multi-repo work until we are well into Scope 1.

1 Like

Good point. I do agree an approach where we don’t settle for a hacky plugin API, and that plugin APIs are heavyweights.

That said, our current interface that didn’t have 3 users when we hacked it in, would be intermediate formats like the scores. Which now ends up driving the widgets, payouts, several notebooks and stack lookup client.

Each of them has a dependency on SourceCred data, but the monolith repository is not ready to be used as a library dependency. For example, the notebooks would implode when it sees it needs to compile bettersqlite, pull in webpack and transpile flow and esm.

So I do stand by the idea that a short term core module that will act as a library is much needed. The only way I can see both happening right now would be bundling and multiple entrypoints. For instance, a payouts module could depend on some core utils like compat, score format parsing, and some payout specific code. With the tree-shaking it bundles without the CLI, the UI, the mirror and fs code, making it something you could target a browser with.

If anything though, that increases coupling more than it forces to think about public APIs. So to prepare for this a convention would be needed, for example each module needs an index.js to define it’s public API and we can only import from here in other modules?