I’m not sure if this old-ish thread represents an idea that has already been implemented or further evolved elsewhere, but I just signed up to this forum to say that this is exactly the solution I wanted to see in SourceCred! I have some further suggestions.
40/60
I’d propose that we start with 60% cred flows to the author and 40% to the references. Not a big change, but then we can always say the author gets “most” of their cred, and the minority goes to references, which feels better than “you only get half” (though 40% is a huge minority).
Allocation Incentives
If the contribution is unreferenced, it goes back to the seed node, which I will call the “drain”. Though I know that this actually just “genericizes” the cred, sending it to the whole project indiscriminately. We can explain to authors that by linking no references, they merely give up their opportunity to say where that cred goes. They are incentivized to put references. Perfect.
The only remaining problems are around where they put it:
- They can bias their choices to be a higher ratio of references that point back to them, vs. not.
- They can under-report the references out of mere laziness, despite the fact that reporting all the references is better.
- They can under-report the references out of respect for the major ones, since minor ones take the same slice.
- They can report references exhaustively, despite the fact that a major references is wildly more important than any of the minor ones (say, a slight influence vs. a critical backbone).
Loopback factor (mitigate cred traps)
To mitigate the first problem, I suggest all potential target destinations be given a “loopback” score calculated for the given author. Think of it as microphone feedback - the mic can always hear the speaker at some negligible level, but only when the gap between them gets to a certain threshold does the chain reaction become audible and before you know it you have a horrible screeching noise - feedback.
We calculate a reference’s graph connections back to the author. If there are none, the loopback factor is zero. If there is one, we calculate it’s length in hops, and weigh it accordingly, where 0 hops (straight back) is maximum loopback, and half the total node count is minimum loopback (it would have to travel through a loop so big that it would involve half of all the nodes just to get back).
Alternatively, we can use the leakage at each node to determine how many hops it would take before the cred is diluted so much that it doesn’t matter - and then only look for loops smaller than that.
Finally, if there are multiple unique loopback paths of significant strength (which is perfectly normal), we combine them by weight, to get the reference’s final loopback factor.
A loopback factor of 0 means a reference gets 100% of the cred due, as usual. This decreases only slightly with a higher factor, at first. I’m imagining a cubic curve would do the trick. As the loopback factor gets close to the “suspicious zone”, the curve gets steep, which then flattens out near the top in the “obviously self-serving” zone. Near that point, only a small amount (say 8%) of the total reference cred gets considered for this reference.
The other 92%, however, is up for grabs to the other references, if they exist! The seed node takes whatever is left.
This way, it’s not pointless to reference yourself (which may be totally valid anyway) but not lucrative either. It’s also somewhat less lucrative to indirectly reference yourself, proportional to just how tightly knit the feedback loop is, but still totally worth it when you don’t do it too often.
The inverse of the loopback factor can be thought of as a diversity score - better yet, such a score could consider whether you’ve ever linked this far away on the graph before. Encouraging diverse references and countering local bias!
Not all references are created equal
To solve the remaining issues in the allocation bullet list above, what if we split references into different types? I understand this is already the case outside the OP example, but the critical piece is that there is a way to account for some “references” being far more important than others, so that people don’t leave out the little guys just to make sure the big ones get a big chunk.
Perhaps just three categories would do. 3 for the giants whose shoulders the author stood on. Dependencies and such. 1 for the passing influences, like minor wording tweaks, coloring ideas, emotional support. and 2 for everything in between.
These could literally get the 40% slice of the pie in a 3:2:1 ratio accordingly, and that might just work as-is. Success would mean that authors are always naming all their references, with so little to gain by culling them that they don’t bother trying.