At dinner with @3blue1brown tonight, I described the weight composition problem to him.
He suggested viewing the weights / heuristics as unit vectors (in node-space), and viewing adding heuristics as weighted interpolation between the current vector, and the vector defined by the new heuristic. If the new heuristic isn’t defined for every node, then you just interpolate on the dimensions the new vector is defined for, while staying on the unit simplex in the whole space.
I haven’t fully grokked this yet and worked through the implications, but I think this is a new and promising approach. Thanks @3blue1brown!