Main

Traits, broadly speaking, are measurable attributes or characteristics of organisms. Traits related to function (for example, leaf size, body mass, tooth size or growth form) are often used to understand how organisms interact with their environment and other species via key vital rates such as survival, development and reproduction1,2,3,4,5.

Trait-based approaches have long been used in systematics and macroevolution to delineate taxa and reconstruct ancestral morphology and function6,7,8 and to link candidate genes to phentoypes9,10,11. The broad appeal of the trait concept is its ability to facilitate quantitative comparisons of biological form and function. Traits also allow us to mechanistically link organismal responses to abiotic and biotic factors with measurements that are, in principle, relatively easy to capture across large numbers of individuals. For example, appropriately chosen and defined traits can help identify lineages that share similar life-history strategies for a given environmental regime12,13. Documenting and understanding the diversity and composition of traits in ecosystems directly contributes to our understanding of organismal and ecosystem processes, functionality, productivity and resilience in the face of environmental change14,15,16,17,18,19.

In light of the multiple applications of trait data to address challenges of global significance (Box 1), a central question remains: How can we most effectively advance the synthesis of trait data within and across disciplines? In recent decades, the collection, compilation and availability of trait data for a variety of organisms has accelerated rapidly. Substantial trait databases now exist for plants20,21,22,23, reptiles24,25, invertebrates23,26,27,28,29, fish30,31, corals32, birds23,33,34, amphibians35, mammals23,36,37,38 and fungi23,39, and parallel efforts are no doubt underway for other taxa. Though considerable effort has been made to quantify traits for some groups (for example, Fig. 1), substantial work remains. To develop and test theory in biodiversity science, much greater effort is needed to fill in trait data across the Tree of Life by combining and integrating data and trait collection efforts.

Fig. 1: Mammal, bird and plant phylogenies coloured according to the number of traits for which we have data for each species and lineage.
figure 1

The plant phylogeny is sparsely populated for traits but contains more taxa (n = 10,596) than the mammal and bird phylogenies (n = 5,747 and 9,993, respectively). Trait data were downloaded from refs. 25,34,87. We counted the number of traits present across these datasets for each species and mapped those onto phylogenies using posteriors37,88 and a random subset of plant species within a single phylogeny89. Terminal branches (representing species) and ancestral lineages (using ancestral state reconstruction90) were coloured according to the number of reconstructed traits. Note that this is an exploratory analysis conducted purely to show variation in the availability of trait data across taxonomic groups.

Current barriers to global trait-based science

Despite the recognized importance of traits, several common research practices limit our capacity for meaningful synthesis across the Tree of Life. These practices include failure to publish usable datasets alongside new findings40, missing or inadequate metadata41, minimal descriptions of methods used to collate, clean and analyse trait datasets in published works42, and inadequate coordination between researchers and institutions with common goals, such as filling strategic spatial or taxonomic gaps in trait knowledge43,44. Our limited ability to access and redistribute trait data contributes to the widespread reproducibility crisis within science45. Any study relying on data that cannot easily be re-used introduces barriers to verifying the claims made by those studies and thereby questions the reproducibility of the science46, which is becoming of prime importance to many scientific journals. Such limitations have been common within trait-based science.

Access to data is not the only impediment to a global synthesis of trait knowledge. Barriers to synthesis exist because researchers and institutions are apprehensive that the time and resources they spend to create new observations or share legacy data (for example, observations from field guides, specimens, or publications without data supplements) will not be recognized. Identifying who should receive credit for contributing trait observations (whether via co-authorship or other formal recognition) is a complex issue, particularly where data involve a chain of expertise (for example, when trait data are extracted from taxonomic treatments involving specimen collectors, digitizers, taxonomists and curators). Funding bodies are often reluctant to support data management, limiting recognition of the sizeable effort expended on creating bespoke solutions to curating and harmonizing trait data from different sources46.

Opportunities exist for expanding the spatial and taxonomic coverage of trait observations, particularly by strengthening interdisciplinary connections across single organismic groups. Despite certain plant traits (for example, growth form, height and leaf size) being carefully catalogued in taxonomic species descriptions47, these data have only recently been exchanged with large-scale databases such as TRY21 or BIEN (http://bien.nceas.ucsb.edu/bien/). Although several informatics challenges in biodiversity science have now been overcome (for example, synthesizing global species occurrence information (https://www.gbif.org/) and sharing genetic data on individuals (https://www.ncbi.nlm.nih.gov/genbank/)), trait science lacks a vision for achieving global integration across all organisms. We argue that this is not simply a failure of the traits community to learn from existing successful networks. Instead, cataloguing traits is a more complex task that is highly context-dependent and therefore needs a more refined network model than that offered by a centralized repository.

We propose that widespread adoption of key Open Science principles (Box 2) could be transformative for trait science in achieving a global synthesis. These principles would lay a strong foundation for transparency, reproducibility and recognition, and encourage a culture of data sharing and collaboration beyond established networks. Openness reinforces the scientific process by allowing increased scrutiny of methods and results, resulting in the deeper exploration of findings and their significance42,48,49,50,51. The scope of trait science would increase if researchers and institutions: (1) made datasets available in machine-accessible formats under clear licensing arrangements; (2) created and adopted standardized protocols, handbooks or metadata formats for data collection, documentation and management (see refs. 48,49); and (3) created human-centred networks to reduce the complexity of integrating existing data from disparate sources (for example, specimens, published literature, citizen-science initiatives50,51 and large-scale digitization efforts). These different sources exhibit systematic differences in error rates, validation, context, reproducibility and objectivity relative to field-collected trait observations. Without a model of recognition that embraces transparency and fairness, much trait data will remain hidden from science.

Introducing the Open Traits Network

The Open Traits Network (OTN) is a collaborative initiative for accelerating trait data synthesis. Specifically, it is a global, decentralized community welcoming all researchers and institutions pursuing the collaborative goal of standardizing and integrating trait data across all organisms. We promote five main objectives built upon Open Science ideals that could transform trait science:

  1. (1)

    Openly sharing data, methods, protocols, codes and workflows.

  2. (2)

    Citing original data collectors and providing scholarly credit.

  3. (3)

    Providing appropriate metadata together with trait observations.

  4. (4)

    Collecting trait data following reproducible, standardized methods and protocols (when available), or committing to their development.

  5. (5)

    Providing training resources in trait collection and database construction using Open Science principles.

We envision a future for trait research where protocols for data exchange and re-use are transparent, research findings are reproducible, and all trait data (either newly collected or from legacy sources) are openly available to the research community and broader public. While several network models exist in trait research (Fig. 2), the OTN adopts a decentralized but connected structure with an emphasis on bringing people together through data and expertise.

Fig. 2: Architectures of three alternative networks in which research groups (nodes) interact in collecting and organizing trait data.
figure 2

Black nodes are individuals, groups or institutions conducting projects. Light-green nodes are those harmonizing data and developing protocols, where node size is proportional to available resources. Dark-green nodes are synthesis nodes that collect standardized trait data and knowledge. a, Groups are disconnected and decentralized, risking duplication of effort (often the status quo). b, Groups are linked to a centralized repository, potentially limiting innovation. c, The Open Traits Network, represented by orange lines. Nodes are linked within biological domains (for example, plants or marine) and include expertise from diverse disciplines (for example, systematics, palaeobiology, ecology and biomechanics) allowing for more efficient and specialized decisions about trait collection. Data synthesis across domains or disciplines is facilitated by joining nodes based on common workflows, theoretical frameworks and data-sharing protocols that adhere to the principles of the Open Traits Network.

Often, groups building smaller-scale databases do so in isolation, using their own tools and workflows tailored to their research question; they are decentralized and disconnected (Fig. 2a). Decentralization has certain advantages, including retaining the power to determine which traits are most useful in a study system and how they should be compiled. There is little formal support or interaction across this style of network, so researchers often collect redundant data and develop similar tools for data collection, cleaning and integration, which can lead to duplication of effort. There are many small, isolated and heterogeneous data sources of this sort, increasing the disconnect between pools of trait data52.

For some organisms, centralized hubs exist to aggregate and standardize trait data across disparate sources (see refs. 21,32,53,54,55,56,57) (Fig. 2b). These trait repositories have become the main access point for trait data on well-studied taxa such as plants and corals, but they remain mostly isolated, limiting the sharing of expertise and information across taxa. As these repositories continue to grow, difficulties with data integration and synthesis will also increase due to the momentum of entrenched workflows and exchange protocols that may not be interoperable.

Some successful large-scale initiatives have followed the centralized and connected network model (for example, the Global Biodiversity Information Facility (GBIF; https://www.gbif.org/) and GenBank (https://www.ncbi.nlm.nih.gov/genbank/)). These platforms mandate strict data exchange protocols to facilitate synthesis using standardized vocabularies (for example, the Darwin Core58 and Humboldt Core59). These protocols have been central to the explosive growth of biodiversity data as they facilitate the exchange of information using common data formats58,59,60. Ontologies that provide unified terms and concepts necessary to represent traits have been developed (for example, Uberon, the multispecies anatomy ontology for animals61, and TOP, the Thesaurus of Plant characteristics62). These provide integration with other data types (for example, genetic and environmental) and their corresponding ontologies (for example, Gene Ontology63 and Environmental Ontology64).

Despite these successes, we argue that a centralized and connected network structure will not facilitate trait data synthesis. Trait observations are highly nuanced and hierarchical. Describing multiple aspects of a phenotype for any organism with traits is not amenable to a simplified set of exchange fields that apply across the Tree of Life. While the centralized and connected model (Fig. 2b) does have benefits, it lacks the necessary flexibility to connect trait data where ontologies and exchange formats do not exist. The likely result is that established trait networks will remain isolated and disconnected.

The decentralized but connected model (orange connections in Fig. 2c) adopted by the OTN maintains the key advantages of a decentralized network (for example, taxon- or discipline-specific decision making) while enhancing the level of connectivity among groups, allowing for easier sharing of expertise, tools and data. These network characteristics also buffer against node loss (for example, due to lack of funding). Decentralized and connected networks are characterized by socially mediated improvements in learning65 as they capitalize on the aggregated judgement of many experts rather than singular opinions66. The OTN model capitalizes on existing connections within disciplines and links domains across the Tree of Life to disseminate knowledge about traits. By recognizing the importance of specialist taxon groups (light-green nodes in Fig. 2c) and accommodating their needs into the development of cross-domain tools for synthesis (dark-green nodes in Fig. 2c), the OTN model will be particularly beneficial for low-profile taxa that may not be accommodated by a centralized effort to synthesize data. The OTN’s open, decentralized network structure will allow researchers to retain agency and independence while also creating a collaborative effort to minimize the duplication of effort.

How (and why) to participate in the OTN

The OTN seeks to broaden its membership by lowering barriers to inclusion and advocating for approaches to trait science that benefit data custodians. New members can join the OTN via our website (www.opentraits.org) through two mechanisms: (1) adding a member profile (for example, name, location, expertise and collaboration statement); and/or (2) registering their open-source (or embargoed) trait datasets in the OTN Trait Dataset Registry (see Activity 1). The registry contains metadata for trait datasets and links users to the open dataset. New entries to the registry will be reviewed by OTN members before being added. This step will facilitate interaction between new and established OTN members and encourage deeper collaboration. Once registered, members will receive regular updates about the OTN, including newly registered trait datasets, notifications about upcoming chances for face-to-face meetings, and funding opportunities. Members will also benefit from the OTN through the sharing of resources, funding calls and workshops where appropriate.

OTN membership spans scientists (and institutions) with high-level expertise in trait data science and synthesis activities through to those with strong motivations to work with traits but little expertise. The OTN has already conducted an international workshop facilitated by an open call for participants, with more workshops planned. Following this initial communication process, we are currently sharing ideas and act upon them within subgroups. Being a decentralized network, the OTN does not need to rely on funding and dedicated personnel to complete tasks, though larger goals will benefit from financial support. Instead, we will communicate the joint aims and gaps between network nodes (Fig. 2) and arrange workshops and activities where necessary.

We recognize that altruism is unlikely to offer enough motivation to ensure widespread participation in the OTN. The sharing of trait datasets is not merely a technical problem to be solved; it relies on custodians having the skills, incentives and motivation to contribute. The key incentives for individuals to join the OTN include increasing the findability of their data and expertise and having access to a ready-made network of trait scientists and institutions engaging in relevant initiatives. Data are a powerful asset for researchers, and release under open-license schemes accompanied by well-defined metadata offers great potential for new collaborations and increased visibility. A persistent concern is that scientists will lose control of their hard-earned data under open licensing, though this underestimates the potential for new collaborations and may unnecessarily increase distrust within the scientific community67. Access to scientific networks can provide valuable exposure and connection49, particularly for early-career researchers and those in developing nations, although it is important to understand the risks involved. By emphasizing the importance of community engagement and support, the OTN seeks to make trait-data sharing and synthesis an opportunity for all involved rather than simply a technical challenge to be solved.

Milestones toward an open approach to trait-based science

We highlight five OTN activities (several of which are already operational) that demonstrate the power of a decentralized and connected network to increase knowledge transfer in trait science. Trait scientists have made significant achievements in key areas, such as the synthesis of large numbers of observations within taxonomic groups20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39 and the development of theory and frameworks to use these data when testing ideas and large-scale empirical studies1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19. However, basic foundations are still lacking to quantify how and why traits vary across organisms.

Activity 1: Maintaining a global registry of trait-based initiatives

Several data gaps impede synthetic analyses across taxa, geographical locations and ontogeny. The heterogeneous ways in which trait data have been collected to date have resulted in a patchy and unrepresentative data landscape across trait types, taxa, regions and times of the year68,69. The OTN bridges these gaps by maintaining a Trait Dataset Registry that can be accessed at http://opentraits.org/datasets.html.

The OTN Registry contains information on existing open (or embargoed) datasets so that gaps can be identified and ultimately filled through collective effort. Core information for the registry includes Digital Object Identifier (DOI), taxonomic coverage, curator and format. The OTN Registry also provides the opportunity for contributors to identify if and where code to process and manipulate raw data is located (see Activity 2). As it develops, the OTN Registry will relate trait concepts to ontologies provided through the Open Biomedical Ontologies Foundry (http://www.obofoundry.org). The OTN Registry maps to several Open Science principles (for example, Open Source, Open Data and Open Access; Box 2) and is designed to support data retrieval and integration.

The OTN does not place restrictions on what members may consider traits of importance to a taxonomic group. Most traits can be measured from individuals and fit into existing definitions, though this may not be appropriate for organisms where individual or taxonomic boundaries are unclear (for example, microbes70 and fungi71). It can be argued that traits encompass emergent properties of populations (for example, abundance and geographic range size) or represent interactions among species (for example, diet type). Within the OTN, we believe that more important than imposing strict definitions around traits is engaging the community in discussion about the utility of available data for answering novel ecological and evolutionary questions.

Activity 2: Sharing reproducible workflows and tools for aggregating trait data

The OTN leverages collaborative software development via platforms like GitHub (https://github.com/) to create modular, open-source software to access, harmonize and re-use data with seamless piping of data from one software tool to the next. OTN contributors have already developed several open-source tools such as the traitdataform package, which assists R users to format their data and harmonize units (http://ecologicaltraitdata.github.io/traitdataform). The code for the Coral Traits database32 (https://github.com/jmadin/traits) could be modified to guide the creation of databases on other organisms. The FENNEC project provides a tool for accessing and viewing community trait data as a self-hosted website service72 (https://github.com/molbiodiv/fennec). The OTN can act as a connector between developers and the broader community seeking to synthesize trait data, facilitating the training of scientists in all aspects of reproducible data management.

Activity 3: Advocating for a free flow of data and appropriate credit

One goal of the OTN is to increase the use of open datasets and to ensure due credit is given to researchers who collect or synthesize primary data. Without effective reward or motivation for collecting new trait observations or sharing legacy data, a trait synthesis across the Tree of Life will remain unattainable. Currently, motivation for collecting and sharing new primary data is not strong and direct funding for trait data management is scarce.

The OTN can strengthen the attribution of credit to data providers and promote new data collection via two paths. Firstly, the OTN will encourage citation back to primary source via a permissive license model that secures authorship attribution (for example, Creative Commons Attribution 4.0 Int; CC BY 4.0) and the use of DOIs and Open Researcher and Contributor ID (ORCID) identifiers. Open-access datasets with a DOI can be tracked to understand patterns of re-use and to assess the impact of the author’s decision to share.

There is an important distinction between sharing data within a network and making data publicly available under an open license. Clear license arrangements increase visibility and promote fair attribution and citation (for example, using Creative Commons licenses such as CC-BY or CC0). CC-BY requires attribution (that is, citation) to the original creator whereas CC0 does not legally require users of the data to cite the source, though this does not affect ethical norms for attribution in research communities (https://creativecommons.org/share-your-work/public-domain/cc0/). Identifying who should be credited for prior work on legacy data is complicated by the involvement of many individuals. This issue could be solved, in part, by inviting organizations to be named as contributors or co-authors on outputs using their data or (looking forward) implementing new ways of documenting who should be credited for making specimens or datasets usable in trait science.

Incentives to collect new trait data can be linked to the Open Science practice of pre-registration. In pre-registration, authors archive a public proposal for research activities (for example, via the Centre for Open Science; https://cos.io/prereg/) which, if approved, may receive in-principle acceptance from participating journals. As of March 2019, 168 journals are willing to give in-principle acceptance following pre-review of the study design prior to conducting field or experimental work. Ten of these participating journals regularly feature papers on trait-based science (for example, BMC Ecology and Ecology and Evolution). We envision a situation where the OTN Trait Registry (Activity 1) could be used to identify spatial or taxonomic gaps in trait data that could be coupled to pre-registered hypotheses. Together, pre-registration and in-principle acceptance of findings could incentivize the collection of new data, circumventing the growing reliance on available data with known gaps.

Activity 4: Creating a trait core to facilitate synthesis and standardization

Trait science requires its own ‘core’ terminology or data standard that is flexible enough to capture the complexity of trait data. Building on efforts to standardize occurrence data (that is, Darwin Core58) and biological inventories (that is, Humboldt Core43,59), the OTN envisions a trait core offering a set of cross-domain metadata standards and controlled vocabularies that are (ideally) connected to trait ontologies via unambiguous identifiers. This standard terminology would be implemented across trait-data publications, unifying data in decentralized repositories as well as centralized data portals.

A trait core would allow trait data to be: (1) interpreted accurately within the context of their collection (that is, including information on associated data on factors such as environmental conditions at collection sites, taxa covered, data custodians or collection methods); and (2) known by compatible terms so that observations of similar phenomena can be grouped and compared (that is, what is meant by ‘generation time’ or ‘establishment’ across taxonomic groups73,74). Existing initiatives may provide logical cornerstones for referencing terms and concepts, including Ecological Metadata Language41. Several initiatives implement the Ecological Metadata Language (for example, The Knowledge Network for Biocomplexity75, Darwin Core58 and Humboldt Core59) and the use of referencing terms from anatomy or phenotype ontologies (for example, the Plant Ontology66 and the Vertebrate Trait Ontology67) to relate traits to publicly defined terms, allowing annotated data to be processed computationally (http://www.obofoundry.org).

Progress towards a trait core is already being made through the development of a prototypal Ecological Trait Standard76 (Box 3). However, the development and adoption of a trait core requires consultation and coordination within the broader scientific community, a goal which the OTN is ideally placed to advance. The OTN can mobilize expertise for cross-domain workshops and advocate for funding, which allows not only meetings of experts but also the creation of cyber-infrastructure for synthesis nodes (dark-green nodes in Fig. 2c). Links to emerging initiatives for biodiversity data standardization (for example, Species Index of Knowledge57) will also be vital for success, as will ratification of the core through the Biodiversity Information Standards (TDWG, www.tdwg.org).

Activity 5: Facilitating consistent approaches to measuring traits within major groups

The OTN will share new developments towards protocols and handbooks for major clades that standardize approaches to capture trait observations. Protocols are necessary because downstream activities such as developing metadata standards (Activity 4) will be impossible if trait measurement protocols do not exist. Some research communities have adopted standardized terms56,62 and data collection protocols (for example, plants20,77,78,79,80, invertebrates29,81,82,83, mammals36 and aquatic life30,32,84), though these may not always fit the requirements of some studies (for example, where trait variability rather than the average trait of species is targeted85). Protocols and handbooks may not emerge rapidly and should have the flexibility to be open to innovation through a commitment to version control and updates as techniques evolve. Two versions of the plant trait measurement handbook have been published77,86 and several online resources exist that can be updated regularly (see http://prometheuswiki.org/tiki-custom_home.php).

Standardizing approaches to trait measurement across research communities will reduce ambiguity when aggregating data and improve the quality of resulting datasets. Integrating trait standardization and databasing into taxonomic workflows constitutes a challenge and an opportunity7 that holds the promise of bridging the long disconnect between structural and functional traits. The presence of a range of biodiversity collections personnel in the OTN and an open invitation for more to join is expected to catalyse the adoption of trait-based thinking into taxonomic practices.

Concluding remarks

This is the opportune time to push towards a new approach to sharing and synthesizing trait data across all organisms. Trait science has great potential to increase its taxonomic, phylogenetic and spatial scopes by leveraging data-science tools, embracing Open Science principles, and creating stronger connections between researchers, institutions, publishers and funding bodies. We hope that trait enthusiasts, regardless of field and research stage, will engage with the OTN via our website (www.opentraits.org) and help build new connections between disciplines, institutions and taxonomic domains. By adding metadata profiles for datasets to the OTN Trait Dataset Registry, trait collection efforts become more findable, as do the researchers who have compiled them. We envision that by connecting people with common goals, we can work collectively towards a synthesis of global trait data to preserve the nuances of taxon-specific expertise while also facilitating collaboration across domains. We urge scientists and institutions keen to commit to Open Science principles to make use of existing resources, including those offered by the Centre for Open Science (https://cos.io/), the Open Science Training Handbook (https://open-science-training-handbook.gitbook.io/book/), the Open Science Training Initiative (http://www.opensciencetraining.com/index.php) and FOSTER (https://www.fosteropenscience.eu/toolkit).

To support and expand the activities of the OTN, we will grow membership and develop communities around synthesis nodes to undertake key activities and secure funding support, in particular for the development of a trait core. Funding for international workshops, technical support and implementation meetings could drive a new era of trait-based synthesis that mirrors the achievement of similar initiatives such as GBIF, which now houses >1 billion occurrence records.

By supporting a reciprocal exchange of expertise and outputs using Open Science principles between researchers and institutions, we can mobilize data for a cross-taxa, worldwide, trait-based data resource to examine, understand and predict nature’s responses to global change. As a better-connected OTN emerges, data streams and coordination will improve, allowing us to deliver information to support globally important research agendas (Box 1) as well as specific data and knowledge to the public through integration with third-party portals. Lessons learned along the path to a global synthesis of trait data across all organisms will provide a framework for addressing similarly complex, context-dependent challenges in biodiversity informatics and beyond.