RFC: Championed experiments

What

This is a proposal to maintain a limited number of code experiments and use them for the CredSperiment.

Why

We have a review process in place which reliably creates the necessary feedback to make sure smelly code doesn’t creep into the project, and awesome discussions can spark about the changes we’re making.

However, I think it’s worth acknowledging, it hampers our ability to conduct experiments, especially when it comes to the code.

There are 2 code initiatives that I’m championing:

Working on this surfaced a lot of smells or quirks in the code, so great refactoring efforts came out of it. Those refactors I believe will pay off in the long run. But for the short term, it’s also painful we haven’t been able to use Initiatives for the CredSperiment in the ~3 months this feature has been planned and talked about.

Roughly speaking I think there are two general approaches to move these experiments faster:

  1. Lower our review standards, so hacky code for the sake of experiments is accepted.
  2. Maintain experimental code in it’s own branch/fork, not accepting it as good enough long-term code, but using it for the CredSperiment.

Personally, I think option 1 would be a mistake that will come back to bite us. So my proposal is to consider option 2.

How

Each Maintainer will have 2 experiment slots.

An experiment must have a Champion, and the experiment should be community approved. With “should” I mean the TBD can veto, but should avoid it.

For simplicity, the maintainer and champion are required to be the same person. I’ll refer to them as “the champion” from here.

Rationale

Maintaining an experiment will require frequent attention, and will take time away from the “real” code. That’s why I think there should be limited slots. Even then, I think both the Champions and the community should question whether they have the time and commitment to maintain as many as 2.

It should also prevent the “everything is an experiment” trap. So if we want more experiments but think the current ones are too valuable to stop. The only way is to have more maintainers, or to make sure existing ones are implemented as “real” non-experimental features.

The reason why only Maintainers should do this, is because it has very real consequences for the CredSperiment. It requires skill, responsiveness and high community trust to be allowed to influence the CredSperiment, especially with relaxed reviewing standards.

Example lifecycle of an experiment (click to expand)

Please make sure not to get too caught up in details of this example for this proposal.

1) Proposal

The champion writes a proposal for the experiment.

  • It’s should at least include the “what” and “why”.
  • Noting that you would be the champion.

Proposals for starting new experiments, should be separate from ending one. If they don’t have any slots left, they should first propose ending one of their current experiments.

The TBD makes a judgement call about when there is a rough consensus.

2) Developing

Each experiment has it’s own branch. For example: experiment/beanow-initiatives.

And we’d have an “official” branch such as credsperiment which combines them. This branch shouldn’t be directly edited. It’s only purpose is to frequently reset to master, and merge the community approved experiments in. (Manually for now, perhaps a nightly CI to test it)

To make changes to an experiment/* branch, PRs and reviews are used. But review standards here are taking into account that it is an experiment. The main goal of those PRs is to have a sanity check: Is this in line with the proposal? Are we not doing something that is really broken/flawed? Is there low-hanging fruit to make this significantly better?

The champion, is expected to frequently make sure we can reset credsperiment from master. Not doing so allows the TBD to end the experiment, if it prevents the weekly update.

3) Amending an experiment

Hopefully the result of an experiment is that we’ll learn more and learn it sooner. So an experiment needs to be amendable.

This also should be in the form of a proposal. Make sure it includes:

  • Summary of lessons learned.
  • The “what” and “why” of the amendment.

4) Ending an experiment

Ideally the outcome of an experiment is:

  • It was valuable, so we’ve built it as a non-experimental feature!
  • We learned it wasn’t too great, so we’re abandoning it. (Yay! We learned things and saved time!)

That said, ending an experiment can be disruptive for the CredSperiment. So it shouldn’t be taken lightly.

The champion may end an experiment themselves. For example if they’re overburdened this makes sense. And as mentioned above, the TBD may end one if the experiment falls behind in maintenance.

But it’s strongly encouraged to use proposals whenever you can. Especially if the reason could use feedback, like: “I think this experiment hasn’t turned out to be very valuable”. Not only is it worth checking whether that’s subjective. Even if people agree, it’s good to have a record of lessons learned.

Requesting feedback

I’ve provided some concrete workflow of what it would look like. The details of it may change. Most of all I would love to hear whether we should implement a system like this. :smile:

1 Like

Cross posted:

Overall, I like this proposal, as it would allow us to keep high code quality in master, but also get more velocity for the CredSperiment. I have a question about how this would work in practice. Currently, we have one branch with ongoing maintenance, namely master. We know that master will always be in a good state because every PR merges into master, and we don’t merge it unless it passes CI.

Under this proposal, we would have master, credsperiment, and some number of experiment branches. The credsperiment branch is generated by merging all the experiment branches into latest master, and every week I would use credsperiment to actually generate canonical cred scores for grain payout purposes.

However: how do we know that credsperiment is going to be in a working state? If someone merges a PR into master that breaks one of the experiment branches, how will we know, and surface this issue for that experiment’s maintainer to fix it?

The answer should not be: “we find out that the credsperiment branch is broken when @decentralion actually tries to calculate cred for the week”. So basically the maintainer of each experiment needs to take responsibility for making sure that experiment is continually rebased on top of master for inclusion in the CredSperiment. What’s the process there?

:smiley:

At a high level we could technically achieve continuous guarantees credsperiment works. But this would mean blocking PRs to master if it breaks credsperiment. So that’s no good.

What I believe should be possible is to allow changes to master to break credsperiment, but still use CI to see a non-blocking warning about this before, or as soon as we change master. Something like a Discord webhook, or GitHub issue about this happening should be possible too. Alerting experiment champions to have a look at it.

This way we do have faster-than-weekly feedback. If it’s too spammy we could dial that down to a nightly check.

An important piece is missing: What is the plan for graduation of an experiment?

Google’s monorepo has an //experimental top-level directory (which you can find references to in public docs; this isn’t secret information). Code in //experimental is subject to weaker review requirements, and code in the rest of Google is freely allowed to break it. But there is a clear policy that this is really only for experimental code. Anything that is expected to eventually run in production is not permitted to start in //experimental, and you are not permitted to copy code from //experimental into mainline: it’s expected to be re-written with proper standards for review, testing, and the like.

Even though this policy is clearly stated, and despite automated warnings, people still try to flagrantly violate it. People are rarely excited about actually productionizing their code rather than taking the easy way out. So we should be clear about expectations here: If we’ve determined that an experiment in SourceCred is worth promoting to the main repository, how do we proceed?

You’re right. The basic idea I had in mind for this, is to not allow experiments to be scalable.

You must champion and maintain (e.g. you’re perpetually responsible for) your experiment branches and can only have 2. Even if this scaled through the one axis it could, more maintainers, it’ll just get more painful to do. Because maintaining something like 6 * 2 experiment branches with constant breakage, rebasing and conflict resolution would be a real hassle.

That hassle is probably the main reason you would want to end an experiment. Whether you’re the champion with that burden, a community member who thinks too much time is spent on that vs production code, or TBD who could veto if it’s unproductive.

Same for new experiment proposals. A 6th experiment would probably be rejected just for being too cumbersome to maintain, no matter what it is. As intended :]


As far as the route to graduation goes. I didn’t define one because I could imagine multiple approaches.

Should an experiment feel valuable, for example Initiatives hopefully does, it would be pretty disruptive to our own Cred distribution if we then removed the experiment because it’s planned for master.

One way could be to maintain the experiment while simultaneously working to implement it on master. To prevent that disruption, and allow more community feedback. That will keep your experiment slot occupied though.

Another could be to not buy too strongly into the experiment, so manage expectations that Cred may be swinging, and not to spend a whole week retroactively creating initiatives. Instead taking a handful, see if we can make it work. Maybe do 1-2 grain distributions with that enabled and then end the experiment. Cred would go back to the way it was, weights may need reverting too. From there it’s either completing the master implementation, or repeating that kind of experiment with a new design idea if approved.

Don’t know what would be better, but I think the same scaling limits would push for such an approach.

Here’s a simpler proposal: we keep paying based on the CredSperiment scores in master, but we simultaneously publish the experimental instances. Then, the community can look at and discuss the experimental branches. If there’s consensus that an experiment is producing better results, we can prioritize merging it into master, and because of how the “catch up” grain distribution mechanic, in the long run there’s not a big difference between paying for experiments upfront, vs. paying according to the results after they’ve been merged.

1 Like

We’ve just had an opportunity to observe the lifecycle of a code experiment: namely, vsoch/cred-action, a GitHub Action for computing cred on regular intervals. Here’s my attempt at a retrospective.

What went well:

  • @anon60584824 was able to prototype a GitHub Action for computing cred! It works, and is running on Metagame (with help from @Beanow—thanks!), providing real value.
  • @anon60584824 was (presumably) not encumbered by review processes, enabling rapid experimentation. The code was ready when we needed it.
  • @anon60584824 carefully reviewed my changes to the code, asking some good questions and learning a few things.

What could have gone better:

  • There was an expectation that the repository would be adopted wholesale without going through standard review:

    I’d like to suggest the following to move forward:

    • move the cred-action to be under the sourcecred organization - this also means update the example workflows (here and in the examples folder) to use it.
  • Because of urgency, there was pressure on me to review the code quickly. The overall structure of the code was (unsurprisingly :slight_smile: ) good, but there were a bunch of inline issues that needed to be addressed before merging.

    Most of these would have been better suited as pull request review comments, for two reasons. First, the PR author would have been better able to learn from the comments by responding to them themselves. Second, the time commitment to make and test the fixes would have been on the PR author, as it usually is, rather than the reviewer. Note that this isn’t just “who spends the time?”: the PR author has a considerable advantage here due to context. I spent about 4 hours making these changes, including familiarizing myself with the code at a level required to patch it, trying to figure out how to test it, writing commit messages, and responding to review comments. The PR author would have had a lot of this “in cache” already, saving time and effort overall; review comments could have just been “Done.”.

    But the code had already been merged, so this was not an option.

  • While reading the code to review it and make my edits, I had to read the whole repository at once and figure out the structure myself. If this had been reviewed by the normal process, the author would have broken the changes up into small, self-contained commits, each with a clear purpose and straightforward test plan. As always, we prioritize making code easy to review rather than making it easy to write, while prototype code is (kind of by definition) optimized for ease of writing rather than ease of reading. This impedance mismatch was never resolved.

  • Because the repo didn’t go through the standard review process, some parts of the Git history don’t follow SourceCred conventions. In particular, the repository contains merge commits, which we do not use. We can fix issues in the code after merging (as we did), but changing the Git history is much more intrusive.

Given this experience, I think that I still believe what I wrote in my original comment requesting clarity about the graduation process. Prototyping is great for validating an approach early; once the approach is validated, prototyped code should be productionized just like any other code, going through the same review processes.

(For instance, @decentralion and I hacked on a major restructuring of the core algorithm during CredCon. Our work is on a temporary shared branch, but there’s no way that we’ll merge it as is!)

I welcome comments!

2 Likes

Cool, good to know—thanks for clarifying. You’re right; I should probably have probed more there before assuming, even though it was a bit hectic on that particular day.

No worries, and it wasn’t from you; @decentralion and I were talking about what to do and agreed that we wanted to get it reviewed and merged quickly, not because of any exhortations from the PR author but because we thought that we might need the change quickly. This is mostly an area of improvement for us two, I think: we shouldn’t have needed to feel that urgency, because the prospective clients could always have just used a version of the action on a branch on your repo, which in fact they ended up doing.

Also good to know—thank you for sharing. :slight_smile:

Yes, absolutely.

(at the risk of a slight tangent) I’m not sure that I totally understand what you mean, but, generally speaking, we have two merge strategies:

  • For most changes, use one pull request per commit, update that pull request with any changes, and use the “squash and merge” button to create a single commit in master.

  • For changes where you want a sequence of commits in master but it doesn’t make sense to open and separately view a sequence of pull requests, create a single pull request, update that pull request (via force-push) with any changes, and use the “rebase and merge” button to rebase the sequence onto master. (E.g.: #265.)

If you’re planning to use the “squash and merge” button, whether you manually rebase on master and squash prior to clicking the button is not super important.

Agreed, and that totally makes sense. I meant more to look to this as an example of “experiments in general” rather than “something that followed the specific Experimental Prototype proposal in this RFC”. I don’t mean to temper your enthusiasm at all, and I don’t think that you (or anyone) did anything wrong, and certainly wouldn’t say that “it was all wrong”. I guess that I would say that my takeaways are:

  • Pull requests that provide value from highly motivated contributors are excellent and useful, and the current organic process by which they’re created seems to be working well.
  • It’s rarely the case that something needs to be literally in master for it to be depended on for some deadline. You can use branches to push to temporary Docker tags, separate GitHub Pages sites, or refs for GitHub Actions. We (maintainers) shouldn’t self-impose pressure to accelerate review past its natural pace just because the code is already there.

Thanks for engaging constructively in this retrospective, and also, of course, for all the underlying work!

2 Likes