Me, blogging!

thoughts, science, code, jobs, and thoughts

Cookiecutter for a more transparent and reproducible science

Last week I watched an inspiring tutorial ("Data Science is Software") from the SciPy 2016. And this reminded my about the debate how reproducible science really is. It seems that the majority of scientists agree that there is a reproducibility crisis.

That is indeed frustrating, also because research itself should be transparent. I also understand why that is the case from my own experiences: Imagine you did some fine research and you've saved your results for a later publication. The results themselves have been hard to obtain, either beacuse of extensive data analysis, lenghty computer simultions or scraping data from a larger database. Later that year you realised that some of the results are corrupt or you did a mistake in the analysis or you had a wrong configuration. In this case you try to reproduce your own results. However, this is a tedious task, so you go back to the beginning a try to repeat each single step to get to your final result. Fine! But wait. Why not simply automating this process from the beginning? Yeah, sure, but how?

Cookiecutter!

So I adopted the philosophy of the guys from the video above, which states that in order to deliver reproducible (data) science you need "A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.", and developed a similar cookiecutter project template aimed for scientists.

You can find the source on Github and simply install it for your next science project:

cookiecutter gh:mkrapp/cookiecutter-reproducible-science

This tool provides you the basic structure of your research project. You can adjust it to your needsand need to fill it with your reproducible workflow. You can add a customized Makefile, add you preferred command line tools, scripts, or model source code. You can also add raw data and processed data. You can document and also write you scientific paper within this structure.

Let's do more reproducible and transparent science!!!

PS: I'm also currently adapting one of my science projects and I'm going to provide the final reproducible version later on (I need to finish, first).

Tags: design, productivity, reproducibility, github 2016/07/25
Keeping (time) track of different projects

I work for different projects. And it is not easy to keep track of the time spent on these different projects. There are, of course, tools which handle time tracking for you. But they are either for freelancers, too complicated, or expensive. Here's an alternative: go-watch.

I've tried go because the compiled executables can be deployed everywhere. Try that, Python! In principle you would just need a compiled version for your operating system but here's the clue: Try to install it on your own:

All you need is a go installation and a clone of the go-watch project:

git clone git://github.com/mkrapp/go-watch.git

Screenshot of go-watch

Tags: productivity, go, code 2015/10/06
About Mario Krapp
I'm a physicist by training and graduated in Earth System Science.

And I like coding. I've been working with complex computer models ever since my undergrad and I enjoy data exploration and data analysis to gain insights into the underlying principles.

Feel free to contact me.

Info
Archive
Tags