Kia Ora all, providing some context and narrative as to why I decided to create this project.
I am a little homesick and want to do something about it
I have been feeling a little desolate about the fact that I live so far away from home at the moment and cannot contribute to our country. I do realise that this sounds a bit ridiculous but I have always felt that simply by living there and doing little things I am making New Zealand New Zealand. Being located so far away makes the ‘physical’ contributions impossible. I miss being around our beautiful fauna and the lovely amazing people that make our country so unique. I decided that I will embark on a personal project that will help me feel connected to home, and ideally will help me feel like I am ‘contributing’ in my own way. I really have been breaking my head over this for a little while because I kept associating ‘contributing’ with being public, or speaking publically, posting in places, etc. I am really not much of a public speaker and don’t want to edit videos or something (maybe in the future), so I have been trying to figure out what I can contribute with the skillset that I have.
We have a problem
When I was at uni I was working on a project that created a scoring system for people that indicated the risk one would develop certain diseases. This original scoring system was based on genetic data from the US that was predominantly white US’ians (read this to learn more). However, different ethnicities and different community compositions influence this scoring system, and Aotearoas ethnic diversity and our community/social composition is completely different to that used to generate the risk scores. This essentially renders the scoring system useless for us. This is just one example, but there is so much other information out there that is based on the people and community structures of other places. Furthermore, this issue persists in all aspects of research, e.g. population health, environmental conservation (animals, plants), economics.
Why we should care
The issue is that research is used for information. It informs action, by one-self, by the community, by the government. In the risk-score example, the scores enable people to take preventative measures. And for the government, risk-scoring patterns enable public funding to be placed where it is relevant and proportional to the representation within the community. New Zealanders & our government being unable to use this scoring system (or having to use a generalised version of this) results in use spending money, resources, and energy on less relevant issues, and we are not giving people the opportunity to be proactive.
What we are currently missing
I firmly believe that the ability to make good decisions lies in information, but information needs to relate to us rather than be based on ‘trickle down’ findings from other places, as illustrated in the example above. Information and education for Aotearoa needs to be majority informed by research that accomodates the unique set of parameters that is specific to our remote location. And of this we currently have too little.
How can we get there
What we do have is data, and some data is publically available, thanks to the different government departments and public research institutes. What I originally wanted to do was do some analyses and share the insights somehow, but I then realised that all this data is distributed amongst too many different platforms and in so many different filetypes. It is not possible to ‘easily’ create an analysis (as I would normally do at work) because the data is not collected in the same place, or prepared for easy use.
I think I can do something about this
I would love to enable other analysts to contribute so that our education and information does not solely lie on the shoulders of academia and public researchers. I would love to facilitate information flow to help educate our people. I decided that I will try to get the ball rolling by creating a place that collects the data we have publically available. The place should be accessible, easy to use, and make it easy to connect the data with each other. Then we can use this as a baseline to come together as a community to analyse, interpret, and inform each other.
The end goal for the project
What I would love to get out of this is a repository that can work for anyone with approximately my skillset. I want to include tools that I have learnt about and found useful as an analyst. Once it is set up I am excited to actually get to do my favourite mahi, i.e. get down and dirty with all the data we have available, and see what I can find. But really, I would love it if more than just me will do so.
Why document this
As I started I realised I really know shit all about data engineering and its low key difficult. So I am documenting how all this works in case I want to do it again, or to show it for someone else who might want to do the same thing. Of course, I also want to document the insights (once I get to that part). Without further ado: here’s the documentation!