Results of survey on reuse of scientific data in materials research – Innomatsafety Consortium (28 August – 16 October 2020)

Summary

General information:
Total participants (including participants who answered only part of the questions): 76
Disciplines that were most strongly represented: Material sciences (39 %), Toxicology (22 %), Chemistry (14 %)

Metadata:
61 % (of 38 participants) generate metadata
42 % (of 38 participants) are lacking an appropriate standard
42 % (of 24 participants) are using a free text format to collect metadata
25 % (of 24 participants) are using a community metadata standard:

– SEG-Y
– HPATH Ontology
– eNanoMapper Ontology
– Gracious templates
– NANoREG templates
– VoID vocabulary
– ISA-Tab

49 % (of 34 participants) are using in vitro assays
Most important assay metadata are:

– Assay type
– End points
– Detailed description of used (S)OP

Data types used in the community:

– Material creation and characterization
– Exposition and interaction with biological systems
– Risk assessment
– Assay screening data
– Medical and clinical performance
– Topography of sea floors (Bathymetry)
– Various raw data (e.g. spreadsheet-based measurement data including instrument settings, confocal imaging, microscopic imaging, …)

Challenges around SOPs

Generation is very time consuming (23 % of 26 participants)
Incompleteness (15 % of 26 participants)
SOPs and metadata are not connected (15 % of 26 participants)
No system for validation in place (15 % of 26 participants)

96 % (of 26 participants) would like to see some kind of scoring system for the validation of data

Databases/Repositories

DBs are mainly used to find information in the fields of risk assessment and material characterization
28 % (of 25 participants) claim to not use any databases for their work
92 % (of 23 participants) safe raw and analyzed data
30 % (of 26 participants) use servers of their own institutions to store their data
35 % (of 26 participants) share analyzed, curated data after publication
Reasons why participants do not use databases: missing search options, incomplete metadata, databases become unavailable

ELNs
76 % (of 25 participants) do not use ELNs

Reasons: No expertise in-house, no institutional support, difficulties with technical implementation


Detailed results

Question Answer Abs %
General What is your area of expertise? Total 49 100

Materials Science 19 39

Toxicology 11 22

Chemistry 7 14

Biology 5 10

Information Science 2 4

Environmental Science 1 2

Other (Physics; Engineering; Nanomaterial Characterization) 4 8

Which position do you hold? Total 45 100

Group-/Team Leader 10 22

Professor 9 20

Scientist 6 13

Postdoc 6 13

Project manager 4 9

Doctoral candidate 3 7

Consultant 3 7

Data manager/ITSpecialist 2 4

General Manager 1 2

Technical staff member 1 2
Metadata Do you generate metadata for your research data? Total 38 100

Yes 23 61

No 15 39

Why do you not generate metadata for your research data? Total 12 100

No suitable metadata standard for my work exists 5 42

My research data is self-descriptive 2 17

I don’t need metadata 2 17

Other 3 25

Do you use a standard for your metadata? Total 24 100

Free text file 10 42

Own proprietary metadata scheme 8 33

Metadata standard 6 25

Which metadata standard do you use? SEGY; bibtex; ontology hpath; multiple formats and standards to integrate data at https://search.data.enanomapper.net/; Most data from nanosafety cluster are generated in the form of Excel files following templates published at JRC site https://publications.jrc.ec.europa.eu/repository/handle/JRC103178 and https://publications.jrc.ec.europa.eu/repository/handle/JRC117733 as well as other templates; We have recent publications describing the experience at https://www.mdpi.com/2079-4991/10/10/1908; VoID descriptors; ISA-Tab, EU-ToxRisk
Data types Which data types are used in your research field? Total 36 100

Data about material characterisation 13 36

Data for exposure characterisation 10 28

Data for hazard assessment 9 25

Other 4 11

Which other data types are used in your research field? Output from instruments



Elevation grids (Bathymetry)



Data on the reactions of biological objects (e.g. tissue cells) to material properties



Biocompatibility; Clinical performance



Data about phenotypic / cellular characterisation, e. g. confocal miocroscopy



Different types of raw data ranging from simple tables, images up to huge data set coming from e.g. synchrotron sources.



Data on production and processing of materials



Clinical and medical records, assay screening data



Which data types are used about material characterisation? Total 29 100

Imaging data (e.g. light or electron microscopy) 18 62

Data about molecular or chemical structures 8 28

Other 3 10

chemical and mechanical properties; biocompatibility and biological responses (e.g. cytotoxicity); stability and degradation behaviour; particle size distribution; crystal structure; chemical composition; physico-chemical properties

Which assay types do you mainly use? Total 34 100

In vitro 17 50

In vivo 9 26

Omics based 4 12

Other: 4 12

Standard nanomaterial characterization techniques (light scattering, electron microscopy, Particle Tracking Analysis, XRD); we don’t use assays at all; binding assay, western blots

Which information is essential to describe in vivo or in vitro assay data for the safety of innovative materials to ensure optimal use? Total 28 100

Assay type (in vivo, in vitro) 6 21

Endpoint (genotoxicity; subacute; sensitization, mechanistic …) 5 18

Detailed description of applied SOP 5 18

Detailed description of study design 5 18

In-situ characterisation of materials in test system 4 14

Description of applied guideline 3 11

Which factors make the use of SOPs difficult? Total 28 100

Creating good SOPs is time-consuming 6 23

I cannot use SOPs because I develop new procedures/methods 4 15

The SOP is incomplete 4 15

The consistency is lacking in relation to data format and description 3 12

There is no system to assess the validity of SOPs 4 15

Metadata contain no references to the employed SOP 4 15

Other: 1 4

We use SOPs whenever possible but most other labs don’t. Comparison is difficult between labs; Standards need to be universal; Most researchers do not have an idea about SOPs and how to handle them correctly and develop them further
Data quality Would the introduction of a "scoring system", in your opinion, be a good approach to assess data quality? Total 26 100

Yes 25 96

No: 1 4

Assessment based on the study design is not objective enough

Which criteria should be included? Total 24 100

Checking the completeness of the data with regard to standard study design (application route, test system, treatment duration, media, concentration …) 4 17

Data are traceable 4 17

Data acquisition and processing are described in detail 4 17

Checking the completeness of data by comparing to existing guidelines 3 13

The assay is described in full detail according to the OECD templates or can be found in repositories like DB ALM (Database Service on Alternative Methods to animal experimentation) 3 13

The assay performance is given with regard to reproducibility, sensitivity and specificity 3 13

Data are stored in relevant trustworthy databases 2 8

A standard procedure is in place to derive effective concentrations as point of departure for IVIVE (In vitro to in vivo extrapolation) or risk assessment 2 8
Use of databases/ repositories Which databases or repositories do you use for your research? Total 25 100
For human health hazard assessment 8 32
None 7 28
For materials characterisation 6 24
For assay/method description 3 12

Other application 1 4

For exposure 0 0

For environmental risk assessment 0 0

Which databases or repositories do you use in the field of materials characterisation? Metallografic database for image, homemade database under construction to gather all data from material design to in vivo

SciFinder; Spectral Database for Organic Compounds (SDBS)

Data safety sheets, Data product sheets, Product catalogues

Which databases or repositories do you use for your research in human health hazard assessment? PubMed and other literature databases to select all studies on health and toxicological issues

TofREf DB



Hess DB



eCHEMPORTAL



ECHA CHEM DB



CHEMBL



ToxCast



Comptox Chemicals Dashboard; OECD Toolbox; eChemPortal

VITIC (Lhasa Ltd.) for assessment of impurities



eTOX database (IMI)



Company internal dbs and DMS



BioStudies, ToxCast/Tox21, FDA databases, TG-GATEs, DrugMatrix, ToxRefDB, NanoCommons

msds



Which databases or repositories do you use for your research in assay/method description? ELN



Github



wormbase



wormbook



Pubchem



Which other databases or repositories do you use for your research? Only literature databases such as PubMed, WoS, Google Scholar to get the published data

In your experience, what are the shortcomings of the databases you have mentioned? Total 18 100

It is difficult to find relevant data due to the lack of adequate search options 6 33

Metadata are insufficient or incomplete 5 28

Databases or repositories are no longer updated or available 3 17

Data relevant to my research are not included 3 17

Other: 1 6

different access routes and different metadata standards/no standards/missing metadata making it hard to combine data
Data storage and sharing Which type of research data do you keep? Total 25 100
Raw and analysed data 23 92
Analysed data 2 8
Where do you store these data? Total 26 100

On a institute/department server 8 31

On an external storage medium (e.g. external hard drive, USB-stick) 5 19

Locally on your PC 4 15

On a university server or cloud 2 8

Institutional repository 2 8

On a public cloud service 1 4

Disciplinary database 1 4

General purpose data repository (e.g. Dryad, figshare, Zenodo, RADAR) 1 4

Disciplinary community established repository 1 4

Other: 1 4

https://search.data.enanomapper.net/



Do you share your data (raw and/or analysed data)? Total 25 100

Yes 20 80

No 5 20

Why do you not share your data (raw and/or analysed data)? Total 5 100

I do not have time for the necessary data preparation 2 40

Unclear legal situation 1 20

I do not want to risk my know-how advantage 1 20

Other: 1 20

The database I build is an commercial product



Do you share your data (raw and/or analysed data)? Total 20 100

Curated and analysed data, when results have been published 7 35

Curated and analysed data 5 25

All data including raw data, when results have been published 4 20

All data including raw data 3 15

Other 1 5

In your opinion, are there any factors that make the re-use of data in your research field difficult and/or impossible? Total 25 100

Yes 14 56

No 11 44

Select any of the factors here listed that in your opinion make the re-use of data difficult in your research field? (Multiple answers are possible) Total 14 100

Lack of metadata makes it difficult to find and interpret data 3 21

Traceable assignment of experiments and results is difficult due to heterogeneous documentation (e.g. handwritten experimental procedure, digital evaluation of results) 3 21

Access to data is restricted 3 21

There is no linking of analyses or results to the corresponding experiment or scientific objective 3 21

Concerns about data theft or data misuse due to missing data privacy statements 1 7

Other: 1 7

Lack of common standards for data representation and user friendly tools

EU legislation and data protection



Studies publish their data in totally different ways with no standardized units (simply the concentrations used are given in different values). Traceability and interoperability is not guaranteed.
ELNs Are you already using an Electronic Laboratory Notebook (ELN) for the documentation of your research? (25 Answers) Total 25 100

Yes 6 24

No 19 76

Why are you not using an Electronic Laboratory Notebook (ELN) for the documentation of your research? (19 Answers) Total 19 100

The ELN does not meet the technical requirements of the institute 2 11

The know-how required for the implementation is missing 3 16

The institute or faculty does not support ELN installation 3 16

No Open Source solution known 1 5

Commercial versions are too expensive 1 5

Missing staff for technical support 2 11

Difficult technical integration in the lab environment 3 16

Other: 4 21

difficult to motivate/convince PhD students and researchers about usefulness and their own benefit/synergy

I select data and use these published data for risk assessment

We will be using ELN in the future and are in the course of selection

ELN do not meet the regulatory requirements of my institution (GLP).

In my former lab these are used intensely



we are analysing dataset, not producing them



Do guidelines or recommendations for handling research data exist within your department/institute (e.g., an institutional research data policy)? Total 25 100

Yes 14 56

No 6 24

I do not know 5 20

Why are there no guidelines or recommendations for the handling of research data (e.g. an institutional research data policy) in your department/institute? Total 6 100

Existing guidelines or recommendations are not adequate for my research 0 0

Knowledge of guidelines or recommendations is missing 5 83

I do not generate or process research data 0 0

Other: 1 17

there is no person responsible for establishing guidelines



Does your institution have a person responsible for legal issues relating to research data? Total 25 100

Yes 8 32

No 6 24

I do not know 11 44