Derived Datasets Tool (BioDT Data Citation Training)
Derived Datasets Tool (BioDT data citation training)
Introduction
The Derived Dataset Tool is a crucial component of the Biodiversity Digital Twin (BioDT) ecosystem, enabling researchers to create citable DOIs for their derived datasets. During the recent BioDT winter school in Lecce, we conducted an exercise on data citation, where students created their own Derived Datasets using the GBIF UAT portal. However, several students expressed confusion about the tool's functionality, particularly regarding the generation of dataset lists and occurrence counts from the source data file.
The Need for Derived Datasets
Derived datasets are essential in biodiversity research, as they allow researchers to create new datasets by combining existing ones. This process enables the creation of more comprehensive and meaningful datasets, which can be used for various applications, such as species distribution modeling, ecological niche modeling, and conservation planning. The Derived Dataset Tool facilitates this process by providing a user-friendly interface for creating citable DOIs for derived datasets.
The Derived Dataset Tool's Functionality
The Derived Dataset Tool requires a link to the source data for the Derived Dataset, which is the data record the researcher wants to generate a citable DOI for. However, a major confusion for the students was why the tool does not simply generate the list of datasets (datasetKeys) and the number of occurrences used per dataset from this source data file. This confusion highlights the need for a more intuitive and user-friendly interface for the Derived Dataset Tool.
Feature Request: Calculating Dataset Lists and Occurrence Counts
One of the feature requests from the students was the ability to calculate the list of datasets and the number of occurrences per dataset from the source data file. This feature would significantly simplify the process of creating Derived Datasets and would make the tool more user-friendly. The ability to generate this list automatically would save researchers time and effort, allowing them to focus on more complex tasks.
The Importance of Data Citation
Data citation is a critical aspect of biodiversity research, as it enables researchers to credit the original authors of the data used in their studies. The Derived Dataset Tool plays a crucial role in this process by providing a citable DOI for derived datasets. However, the tool's current functionality can be improved to make it more user-friendly and efficient.
Update: User Account Issues
During the BioDT winter school, some students experienced issues logging in to the UAT Sandbox server with their GBIF user name. After investigating this issue, Tobias suggested that the problem might be caused by user accounts created with GitHub, Google, or ORCID. This requirement is also mentioned in the MDT metabrcoding toolkit user guide.
Conclusion
The Derived Dataset Tool is a valuable component of the BioDT ecosystem, enabling researchers to create citable DOIs for their derived datasets. However, the tool's current functionality can be improved to make it more user-friendly and efficient. The feature request to calculate the list of datasets and the number of occurrences per dataset from the source data file is a crucial step in simplifying the process of creating Derived Datasets. By addressing this feature request and improving the tool's functionality, researchers can focus on more complex tasks, and the Derived Dataset Tool can become an even more valuable resource for the biodiversity research community.
Future Development
To improve the Derived Dataset Tool, we propose the following future development:
- Implement the feature to calculate the list of datasets and the number of occurrences per dataset from the source data file.
- Simplify the tool's interface to make it more user-friendly and intuitive.
- Address the user account issues by requiring users to create accounts with GBIF, Google, or ORCID.
- Provide more documentation and support for the tool to help researchers understand its functionality and usage.
By implementing these changes, the Derived Dataset Tool can become an even more valuable resource for the biodiversity research community, enabling researchers to create citable DOIs for their derived datasets with ease.
References
- GBIF UAT portal: https://www.gbif-uat.org/derived-dataset/about
- MDT metabrcoding toolkit user guide: https://docs.gbif-uat.org/mdt-user-guide/en/#trouble
- BioDT winter school: https://www.gbif.org/event/2025/biodiversity-digital-twin-winter-school
Derived Datasets Tool (BioDT data citation training) - Q&A
Introduction
The Derived Dataset Tool is a crucial component of the Biodiversity Digital Twin (BioDT) ecosystem, enabling researchers to create citable DOIs for their derived datasets. In our previous article, we discussed the tool's functionality and the need for improvements to make it more user-friendly and efficient. In this Q&A article, we will address some of the most frequently asked questions about the Derived Dataset Tool and provide clarification on its functionality.
Q1: What is the Derived Dataset Tool?
A1: The Derived Dataset Tool is a web-based application that enables researchers to create citable DOIs for their derived datasets. Derived datasets are new datasets created by combining existing ones, and the tool facilitates this process by providing a user-friendly interface for creating citable DOIs.
Q2: Why do I need to provide a link to the source data for the Derived Dataset?
A2: The source data link is required to ensure that the Derived Dataset Tool can accurately generate the citable DOI for the derived dataset. The tool uses the source data to create a unique identifier for the derived dataset, which is essential for proper citation and credit.
Q3: Why can't the Derived Dataset Tool generate the list of datasets and the number of occurrences per dataset from the source data file?
A3: The Derived Dataset Tool is designed to provide a user-friendly interface for creating citable DOIs, but it does not have the capability to automatically generate the list of datasets and the number of occurrences per dataset from the source data file. This is because the tool is focused on creating citable DOIs, and the generation of this list is a separate process that requires manual input.
Q4: Can I use the Derived Dataset Tool to create citable DOIs for datasets that are not hosted on the GBIF platform?
A4: No, the Derived Dataset Tool is specifically designed to work with datasets hosted on the GBIF platform. If you have a dataset hosted on another platform, you will need to use a different tool or service to create a citable DOI.
Q5: How do I troubleshoot issues with the Derived Dataset Tool?
A5: If you encounter any issues with the Derived Dataset Tool, please contact the GBIF support team for assistance. They will be able to help you troubleshoot the issue and provide guidance on how to use the tool effectively.
Q6: Can I use the Derived Dataset Tool to create citable DOIs for datasets that are not publicly available?
A6: No, the Derived Dataset Tool is designed to work with publicly available datasets. If you have a dataset that is not publicly available, you will need to use a different tool or service to create a citable DOI.
Q7: How do I cite a derived dataset created using the Derived Dataset Tool?
A7: To cite a derived dataset created using the Derived Dataset Tool, you will need to use the citable DOI provided by the tool. You can then use this DOI in your citation, along with the original authors and publication information.
Q8: Can I use the Derived Dataset Tool to create citable DOIs for datasets that are not in the GBIF database?
A8: No, the Derived Dataset Tool is specifically designed to work with datasets hosted on the GBIF platform. If you have a dataset that is not in the GBIF database, you will need to use a different tool or service to create a citable DOI.
Q9: How do I know if my derived dataset is citable?
A9: If you have created a derived dataset using the Derived Dataset Tool, you can check if it is citable by looking for the citable DOI provided by the tool. If the DOI is present, then your derived dataset is citable.
Q10: Can I use the Derived Dataset Tool to create citable DOIs for datasets that are not in the same language as the original dataset?
A10: Yes, the Derived Dataset Tool can be used to create citable DOIs for datasets that are not in the same language as the original dataset. However, you will need to ensure that the derived dataset is properly translated and formatted to ensure accurate citation and credit.
Conclusion
The Derived Dataset Tool is a valuable resource for researchers who want to create citable DOIs for their derived datasets. By understanding the tool's functionality and limitations, researchers can use it effectively to create citable DOIs and properly credit the original authors of the data used in their studies. If you have any further questions or concerns, please do not hesitate to contact the GBIF support team for assistance.