Being a good researcher requires being (or having) a good datamanager, because there is so much data to manage. First, when setting up your research, you need data on what has been done, using which methods, and by whom. Next, you may need to perform pilot studies to determine optimal parameters: again data that needs managing so that you can make systematic comparisons to build your perfect experiment. Then comes the main study: you need to manage the incoming data, data on participants, and you need to report on what you have done precisely, how you did it, how good you did it, and what the end result is.
Although this scientific process (hopefully) results in a clean and clear paper, reconstructing what has been done using which (selections of) data is often much more fuzzy. That leads for relatively simple requests to become big problems. How would you react if some asked you questions like: “Can you give me the data from paper X?” or, even worse, “Can I use the task you developed and used for study X?”.
Being able to quickly and adequately address these questions requires you to think about a number of datamanagement issues before, during and after the study: Where do I store my data? In what format? Which meta-data needs to accompany the data set? How do I get the data in a format that can be accessed later? What kind of participant data am I allowed to store? And for the department: Is data still accesible and usable when the researcher leaves the institute? Who has ownership of the data?Read more...
With the growth of universities in terms of number of researchers, the growth in the number and size of data-sets, and the desire to share and combine data across the globe, coupled with stricter government rules on privacy, adequate datamanagement becomes ever more important. However, as I have experienced myself, there are many issues with transforming “classical research” characterised by a mainly individual (or small-team) mindset into a Big Data FAIR kind of working. For the success of this transformation, it is fundamental that it is not (only) mandated by “the boss”, but actively encouraged by enthousiastic researchers who are excited about the benefits of this new way of working. Making researchers ambassadors for change can be very helpful in implementing new ways of doing research.
During my time at the UMC Utrecht and University Utrecht, I made some MATLAB databases so that at least for the neuroimaging experiments, I could back-track all proceedings, retrieve the data quickly, and reuse it with new software or combine it with new data. Furthermore, I started my own Open Science initiative, allowing me to retrieve and share experiments and methods (see My Open Science). At the UU, got involved in the YOUth study, a large-scale longitudinal cohort study following children in their development from pregnancy until early adulthood. My task in YOUth was to setup functional MRI data studies, and manage the data-flow for that study. For that reason, I teamed up with the people developing YODA, as well as data-managers from the UU Library to set up a system for FAIR data handling. As a kind of mental exercise, I developed STEP: Structured Tools for Enhanced Processing (clickable version here). STEP is the framework connecting internal researchers, research assistants and management, and external parties (researchers from other institutes, new hires who learning the process, ethical officers) to the data, research methods, and research quality measures. Furthermore, I developed HECTOR (Highly Effective Computing Tool to Obtain Results) as a practical MATLAB pipeline to generate (meta-)data for use in a FAIR environment. Finally, I developed procedures for YOUth to facilitate FAIR data acquisition (i.e. collecting all the relevant information from researchers about their study).