Refactoring the load system

Description


The existing load system was quite complex, which made it hard to change. And it needed to change, in order to support Unified reference detection

This Initiative is about biting the bullet to break the existing load system into smaller parts that aren’t as difficult to change.


Status: Completed

The refactor itself has been implemented! :tada: Switch to LoadContext based loading :tada:

Champion

Avatar Beanow @Beanow


Benefits


  • Divides and conquers the complexity of the load system.
  • More robust unit tests.
  • Easier to implement changes in the future.
    (better tests = more confidence, less complexity = less headaches for devs)
  • Many implicit features of the load system become explicit.
    For example, we now have a definition of the “SourceCred data directory” we didn’t have before.

Implementation Plan


Apply head to keyboard


Deliverables


A design document which outlines the new concepts we’ll use to break the load system into smaller parts. And plenty of PRs to implement the refactor.


Dependencies


None at the moment.


References


Discussion / trackers

Related refactors, used for reference


Contributions


Design doc

LoadPlan version (concept rejected)

LoadContext version (concept accepted)

Prerequisite refactoring

Cleanup :broom:


1 Like

@decentralion can you give this a look to see if it makes sense? I’ve included some of your related work. Though not all, as WeightedGraph could be it’s own Initiative / Artifact.

Some I’m directly using, so they’re contributions:

Some are replaced but provided input. I’ve opted for references here, because they were intended to support WeightedGraph, not directly to address the complexity of load:

Fun fact, if you’re wondering why so many contributions. Some of the first work started way back in mid-December. It’s been a loooong time coming :sweat_smile:

image

Looks good to me.

Wow. So much work!

Selfish question: will this allow me to run SourceCred on /Bitcoin and other old, messy GitHub repos with lots of dirty data that SourceCred currently chokes on?

No, although @wchargin is working on a change that likely will. See: GraphQL Mirror fidelity awareness

Oooooh. :heart:'d Initiative for what it’s worth.

Added: Improve test readability of loadContext and pluginLoaders

Believe this is now completed! :rocket:

Yeah, I think this is completed. Although based on my experience hacking on CredRank during ETHDenver, I’m worried that the new load system goes overboard on DI in a way that is frustrating/has lots of boilerplate for making changes. So I think we may still have more iterations to arrive at a system that feels like a good balance of testability but also legibility / ease of working with. (cc @wchargin)

1 Like

Perhaps a good moment to share some retrospect thoughts :]

Agree there’s frustration/boilerplate left. However, this initiative/refactor started because the previous version was impossible spaghetti, where all the tests would break if you change how the pipeline itself worked. (Such as breaking up the mirror+graph step into two separate steps).

Introducing such pipeline changes have gone from nightmare to doable now, for example Expose a ReferenceDetector to createPluginGraphs made a change in the pipeline.

Something that I feel became more effortful is having to declare a Loader interface and object for every plugin. While more work, I think it’s an improvement over the before-situation of directly depending on plugin-internal functionality in the load pipeline.

Another trade-off that creates more effort, is how LoadContext is required to be a “dumb” composition class, not allowed to implement any logic. It makes for way more robust and easier testing, but is also an extra step and not the most intuitive abstraction.


A main source of frustration I experience still (before the refactor as well) is how configuration values propagate in multiple formats throughout the entire system. From src/cli to src/core to src/backend to src/plugins.

Having the instance system, reworked CLI and a better configuration abstraction, I think will greatly reduce code churn here.

Possibly some of these points will be included in Productivity / Quality of Life for instance maintainers. However:

This initiative is to save time and make it easier to maintain a SourceCred instance.

I’ve deliberately not included easier / time saving for developers here. So if these further refactors are time consuming and don’t help instance maintainers much, I’m inclined to punt on it for a while.

1 Like