I’m dusting off my python here to do some analysis on cred scores. Thought I would ping the community for input before I start coding. Want to make sure the data structures and analysis functions I write are aligned with the needs of the broader community, and that I don’t paint myself into any corners. I’m OK with this stuff, but no data scientist. Namely, these questions are coming to mind:
What types of questions will people ask of SourceCred data?
There are obviously a bagillion ways to slice this, but are there any burning questions that stand out? Any pressing needs I should be aware of? The main use cases I’m imagining are detecting gaming, estimating cred gained for a given contribution, creating new views into the data (e.g. leaderboards showing contributions vs contributors), and looking for interesting relationships (e.g. causal relationships over time, trust levels between certain contributors, etc.).
What data do we want to analyze?
I’m starting with the raw cred scores in scores.json, but is anyone wanting to dive into more granular graph data? Grain data?
What format do we want the data in?
I’m imagining it will be useful to package data into easily analyzable formats. For example, I’m thinking it would be interesting to analyze Maker forum activity during black thursday, create a post about about it on their forums. That community is likely to want to dig in and play with the data, but tossing them a big gnarly json file with no documentation will exclude a lot of people. Do we want to create human readable csv files? Something else?
Do we want to document current data models?
Do we want to create low-level technical documentation around the data models and formatting? I would find one useful right now myself, but not sure if it’s overkill. Also, are the data objects (e.g. scores.json) stable, or are they still shifting as we add new functionality? I’m imagining creating at least high-level documentation around what data lives in what files, but wanted to get input here before creating any Issues on sourcecred/docs.