Me, blogging!

thoughts, science, code, jobs, and thoughts

Impressions with cookiecutter for reproducible science

I managed to be strict about reproducible science. And the paper (SEMIC: an efficient surface energy and mass balance model applied to the Greenland ice sheet) I was working on with my co-authors is the first example that reproducible science is fun and I will definitevly keep going in that direction. The paper did take much longer than I previously anticipated (at home it was colloquially dubbed Rohrkrepierer) but our Open-Science approach is not to blame.

So we can check one item of the previously mentioned bullet points: cookiecutter for reproducible science serves a good purpose. Since then, literally all of my projects have started with this cookiecutter, even the smallest one. My project folder is now fully organized, each project has its README.md and I know where to look up for file. Some of my projects "in production" are easily converted into version-controlled projects by simply typing

git init .

I very much recommend it. Happy cookie cutting!

Tags: reproducibility, github 2017/05/16
Cookiecutter for a more transparent and reproducible science

Last week I watched an inspiring tutorial ("Data Science is Software") from the SciPy 2016. And this reminded my about the debate how reproducible science really is. It seems that the majority of scientists agree that there is a reproducibility crisis.

That is indeed frustrating, also because research itself should be transparent. I also understand why that is the case from my own experiences: Imagine you did some fine research and you've saved your results for a later publication. The results themselves have been hard to obtain, either beacuse of extensive data analysis, lenghty computer simultions or scraping data from a larger database. Later that year you realised that some of the results are corrupt or you did a mistake in the analysis or you had a wrong configuration. In this case you try to reproduce your own results. However, this is a tedious task, so you go back to the beginning a try to repeat each single step to get to your final result. Fine! But wait. Why not simply automating this process from the beginning? Yeah, sure, but how?

Cookiecutter!

So I adopted the philosophy of the guys from the video above, which states that in order to deliver reproducible (data) science you need "A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.", and developed a similar cookiecutter project template aimed for scientists.

You can find the source on Github and simply install it for your next science project:

cookiecutter gh:mkrapp/cookiecutter-reproducible-science

This tool provides you the basic structure of your research project. You can adjust it to your needsand need to fill it with your reproducible workflow. You can add a customized Makefile, add you preferred command line tools, scripts, or model source code. You can also add raw data and processed data. You can document and also write you scientific paper within this structure.

Let's do more reproducible and transparent science!!!

PS: I'm also currently adapting one of my science projects and I'm going to provide the final reproducible version later on (I need to finish, first).

Tags: design, productivity, reproducibility, github 2016/07/25
About Mario Krapp
I'm a physicist by training and graduated in Earth System Science.

And I like coding. I've been working with complex computer models ever since my undergrad and I enjoy data exploration and data analysis to gain insights into the underlying principles.

Feel free to contact me.

Info
Archive
Tags