SourceCred as an incentive compiler

Copying these from the The power of knowledge graphs thread. SourceCred. Would be awesome to have a blog post on SourceCred’s amazing ability to align incentives in a positive sum way :slight_smile:

1 Like

Always thought the incentive compiler meme was strong. Will appeal esp to anyone that’s done development.

I would say that SourceCred 1.0 (the scores within a repo/Discourse) are like a working low-level language compiler (i.e. assembly language->101000101). We’re now working on high-level compilers (i.e. C+±>101000101). High-level compilers are much harder, but more widely applicable.

Speaking as someone who loves writing compilers, I’m intrigued by this idea, and would like to probe it. Please, let me invite you to my mental model.

A language is an assignment of semantics to syntax. A compiler is a transformation from syntax in one language to syntax in another (maybe the same) language that preserves the semantics.

With this framing, a bunch of common tools are seen to be special cases of compilers:

  • An optimizer takes one program and spits out a new one with the same behavior (semantics), but that generally runs faster than the original program.
  • An obfuscator is a compiler whose output is harder to understand than its input.
  • A minifier is a compiler whose output is smaller than its input.
  • A prettifier is a compiler whose output is more consistently formatted than its input, and maybe more aesthetically pleasing.
  • An assembler is simply a compiler whose output language is called “assembly”, and likewise for a disassembler.
  • A decompiler is just a compiler.

But note that I haven’t said anything about programming languages. As formulated, this applies to natural languages, too, and it applies to languages that aren’t programming languages or natural languages. The GraphQL Mirror module has a method extract that transforms a subset of a SQL database into a JavaScript object graph, preserving the associated semantics at each end. The TensorBoard data_compat module exposes a function that transforms “v1-style summaries” to “v2-style summaries” while preserving their semantics. These are compilers, and this generalization is critical.

A while ago, on the topic of whether various interactions constitute transactions, @decentralion said:

This is an astute point, and I’d like to refine it a bit. Let’s move up a level. A definition is useful if you can reason about generic instances of the term and apply that reasoning to specific instances. An instance of the definition is useful if the high-level reasoning transfers faithfully and provides insight.

For instance, the above definition of compiler is useful because we can make the general statement that “compilers compose”: if you have a correct compiler from A to B and a correct compiler from B to C, then you also have a correct compiler from A to C, and you know exactly what it does. The classifications of, say, obfuscators and optimizers as compilers are both useful because we now know what happens if you take a program, run it through an optimizer, and then run that through an obfuscator: you have a semantically equivalent program that likely runs faster and is harder to read, because compilers compose.

Personally, I derive great value from this broad definition of compiler. When I want to write a compiler, I know how to structure its internals, how to test its correctness, what it even means for it to be correct, how it should interface with outside systems. In fact, I find this conceptual framing so useful that whenever I’m working, I’m generally running a low-priority background thread called “find the compiler”, in which I try to identify how the problem that I’m trying to solve can be expressed in this framework, after which it becomes basically a “known problem” whose solution I just have to implement.

So: Is SourceCred an incentive compiler? That depends. What language is it compiling from, and what’s it compiling to? What semantics is it preserving, and what non-semantic transformations is it making, if any? And how does this help us reason about what it is or should be?

Some references for further reading:

(Exercise for the interested reader: In the context of programming, what are “compiled languages” and “interpreted languages”?)

1 Like

I love taking metaphors until they break :smiling_imp:

I was definitely thinking of it in the most common definition of the word, which is as an assembler. I.e. taking a high-level desired behavior and ‘compiling’ that into more granular, lower-level incentives that aim to generate, in aggregate, the desired low-level behavior. For the core algo, which actually generates scores that are usually meaningful enough to be actionable (though typically only within a single repo/Discourse instance, as activity is different in nature across repos), it’s a bit fuzzy. Because the semantics are fuzzy. We’re looking at what’s been valued in the past and saying, “More of that please…”. It remains to be seen if we’ll actually get more of that in the wild, but let’s say for the sake of argument that it works well enough. This seems more like AI almost, which learns a model by being fed “successful” data. It “compiles” code much the same way it can generate target images using other images.

I suppose one can also say that new Issues, PRs, etc. are a more specific set of semantics guiding behavior. In this case the language is English (or other spoken language). But also, in my experience in OSS, the language of doing. It is concrete actions, guided by Issues, etc., or self created but still valuable contributinos, that generates :heart:s, PR reviews, and other actions that confer meaningful cred, driving the incentives.

When Initiatives come online, they will act more like high-level programming languages. Giving more specific semantics (desired behaviors).

These seem kinda the same, as both compile high-level languages to low-level languages. One just does it ahead of time in a more optimal manner, and the other does it on the fly. Maybe Initiatives are more like compiled languages, and the base SC algorithm is like a run-time compiler, the other “developers” entering “commands” in the form of Issues, comments, likes, etc.

This