Study the concept
Before you begin a research project, it is possible to hypothesize or theorize about what type and how much data your project will generate. Perhaps you are working on a grant application to support your research, or you are thinking about new equipment or software that your lab might need to conduct your research. At this stage in your research your planning horizon may be very short, but it is still possible to make plans for your data:
- How could better data management practices increase your research efficiency?
- Have you considered a Data Management Plan? Does your funding source require or encourage such a plan?
Whether collaborating directly with others, or simply being situated in a university environment, it is likely that you have access to shared resources such as shared disk space, high performance computing, and possibly even personnel such as statisticians or information technologists. When you are envisaging your data requirements, keep these shared resources in mind:
- What data formats do researchers in your field find useful?
- What software do researchers in your field use to analyze data?
Plan for it!
If you are still in the planning stages of a research project or if you are seeking funding for a research project, now is the perfect time to begin a Data Management Plan. Funders will consider the forethought shown by planning for data management activities when evaluating your grant application. To prove that you have really thought about your data before your project begins, be sure to address the following concepts.
What kind of data will you be collecting? Data type is often confused with data format, but the distinction is important. Think of data type as a summary of your content, while data format is a description of the carrier (e.g. fruit vs. jar). Your data type(s) might include sensor data, instrument data, geospatial data, collated or aggregated data, observational data, simulation data, numerical data, tabular data, textual data, audio/visual data or any other representation of information that can be communicated digitally and reinterpreted by an expert. Describing the type(s) of data you will collect will articulate one deliverable of your research methodology, and will indicate to potential funders that you are aware of the nuances of different data types and have factored these issues in to later lifecycle activities. The following is an example from the MIT Libraries Data Management and Publishing guide:
Observational: data captured in real-time, usually irreplaceable.
Examples: Sensor data, telemetry, survey data, sample data, neuroimages.
Experimental: data from lab equipment, often reproducible, but can be expensive.
Examples: gene sequences, chromatograms, toroid magnetic field data.
Simulation: data generated from test models where model and metadata (inputs) are more important than output data.
Examples: climate models, economic models.
Derived or compiled: data that is reproducible (but very expensive).
Examples: text and data mining, compiled database, 3D models, data gathered from public documents.
How much data will you collect? By estimating the total size of your data collection, you will be able to better support decision making for other lifecycle activities such as data collection and archiving. It is also important to estimate the range of individual file sizes, which may impact some of the infrastructure requirements for your project. Finally, estimating the number of files or in the case of tabular data, the size of your database will show that you have fully considered size limitations and rates of growth for your research data.
What is the story of your data? It is important to establish the inherent value of your data and to indicate how data management activities will maximize the value of your data. Inherent value is often very easy for a researcher to describe, but it is not as quick-to-grasp for those outside looking in. By taking the time to detail the value of your data, you will suggest that the data management activities outlined in your proposal will build upon a valuable resource base, rather than simply fulfilling the requirement of a DMP.
What research question(s) are these data addressing? Your data will undoubtedly impact your field through publication—but a data management plan can help stimulate even broader impacts. Explain the impact that your research will have in your own field, but also explore the impact that your managed-data might have on your own research group, your department or even policy for your institution. Also consider what value other fields might see in your data. For example, data with geospatial coordinates might be of interest to a wide audience outside of your own domain. Consider and argue the ways that managing your data might create a scholarly impact.
Finding help at MSU
Here at MSU we have many options to help you think about your data. Have you considered that there may be data available for you to re-use or re-purpose within your own department? Even the Library is active in collecting and maintaining access to data collections. Are you aware of the services offered at ICER and our very own HPCC? If you feel like you are the only one thinking about data, take some time to look at these resources: