Backups and forking
In yesterday’s community call, we talked about forking as the final “plan B” for open source communities and the cost of it. Generally speaking, the lower the cost of forking, the more it empowers the community.
Thinking about what it would take to “fork” a Discourse instance, my first thought was to use the backup tools. But found out that backups should not be public. Because they contain a lot of private information.
- contents of private message
- plain-text admin API keys
- the user table (hashed passwords, private email addresses)
- and so on…
So in terms of forking, I think something like a static snapshot of all public data would be necessary. That’s another discussion though.
Admins have access to this data
Looking into how sensitive backups are, that made me realize, Admins can download these backups. Meaning they can see all of this data.
I think that is a massive responsibility. It’s greater than for example being a GitHub organization owner. Because the worst they can do is delete the whole org, but have very limited access to private information like this. The responsibility of handling that private data lies with GitHub as a platform. (Although that could arguably be worse for your privacy)
Trust model of Discourse Admins
Basically this means, Admins require ultimate trust. Or put differently, a “bad admin” could cause not just great disruption, but even violate privacy in the moral sense and legal sense, breaking GDPR laws.
Personally I would rather have a technical solution like e2e encryption to avoid needing to give this level of trust to anyone. But this is what Discourse can offer today. So I would like to discuss with everyone how we should handle this.
Minimal admins, bus factor
So one option, we could say let’s have the least number of admins because the risk of abuse increases with each one. That would lead to a bus factor. Losing private message I don’t think is the end of the world, but needing to ask all your users to sign up to a new forum and losing the public discussions, that would be the main disruption for me.
Bus factor mitigation admins
Following from the above option, you might argue, we should have a couple of extra admins, each having their independent backups. So it’s more likely we can recover from an admin going MIA. But each of these admins would need to be trusted with the access to this private data, they can’t share the backups in a public archive or something for the same reason.
Public data backups
We can look into tools that allow backups of all the public data. Unfortunately that means we wouldn’t be able to save functional user accounts (private emails, hashed passwords, are not public data). But would enable anyone who’s interested to keep a backup of the public discussions.
Thoughts / more options?
Would like to hear how everyone feels we should approach this. Or perhaps if we should look into other options (some 2 out of 3 encryption setup?).