U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Using BEAN-counter to quantify genetic interactions from multiplexed barcode sequencing experiments

Affiliations.

  • 1 Bioinformatics and Computational Biology Graduate Program, University of Minnesota Twin Cities, Minneapolis, MN, USA.
  • 2 Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN, USA.
  • 3 RIKEN Center for Sustainable Resource Science, Wako, Saitama, Japan.
  • 4 Yumanity Therapeutics, Cambridge, MA, USA.
  • 5 Donnelly Centre, University of Toronto, Toronto, ON, Canada.
  • 6 Bioinformatics and Computational Biology Graduate Program, University of Minnesota Twin Cities, Minneapolis, MN, USA. [email protected].
  • 7 Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN, USA. [email protected].
  • PMID: 30635653
  • PMCID: PMC6818255
  • DOI: 10.1038/s41596-018-0099-1

The construction of genome-wide mutant collections has enabled high-throughput, high-dimensional quantitative characterization of gene and chemical function, particularly via genetic and chemical-genetic interaction experiments. As the throughput of such experiments increases with improvements in sequencing technology and sample multiplexing, appropriate tools must be developed to handle the large volume of data produced. Here, we describe how to apply our approach to high-throughput, fitness-based profiling of pooled mutant yeast collections using the BEAN-counter software pipeline (Barcoded Experiment Analysis for Next-generation sequencing) for analysis. The software has also successfully processed data from Schizosaccharomyces pombe, Escherichia coli, and Zymomonas mobilis mutant collections. We provide general recommendations for the design of large-scale, multiplexed barcode sequencing experiments. The procedure outlined here was used to score interactions for ~4 million chemical-by-mutant combinations in our recently published chemical-genetic interaction screen of nearly 14,000 chemical compounds across seven diverse compound collections. Here we selected a representative subset of these data on which to demonstrate our analysis pipeline. BEAN-counter is open source, written in Python, and freely available for academic use. Users should be proficient at the command line; advanced users who wish to analyze larger datasets with hundreds or more conditions should also be familiar with concepts in analysis of high-throughput biological data. BEAN-counter encapsulates the knowledge we have accumulated from, and successfully applied to, our multiplexed, pooled barcode sequencing experiments. This protocol will be useful to those interested in generating their own high-dimensional, quantitative characterizations of gene or chemical function in a high-throughput manner.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

A license is required to use the BEAN-counter software ( http://z.umn.edu/beanctr ). It is free for academic use and must be purchased on a per-project basis for commercial use.

Figure 1. Overview of multiplexed barcode sequencing…

Figure 1. Overview of multiplexed barcode sequencing experiments and their processing using the BEAN-counter software.

Figure 2. Design of large-scale, pooled and…

Figure 2. Design of large-scale, pooled and multiplexed chemical-genetic interaction screens.

Figure 3. Schematic of the steps involved…

Figure 3. Schematic of the steps involved in processing large-scale interaction screens using BEAN-counter.

Figure 4. The large signature that we…

Figure 4. The large signature that we observe in and remove from most of our…

Figure 5. An inoculation date-related effect in…

Figure 5. An inoculation date-related effect in one of our datasets.

(Box 1) ( a…

Figure 6. Typical barcode and index tag…

Figure 6. Typical barcode and index tag abundance distributions.

Substantial deviations from these distributions could…

Figure 7. Manual examination of the dataset…

Figure 7. Manual examination of the dataset to flag mutants and conditions for removal.

Figure 8. Analysis of same-compound, same-index tag,…

Figure 8. Analysis of same-compound, same-index tag, and same-lane correlations to detect the presence of…

Figure 9. Removal of large, uninformative signature…

Figure 9. Removal of large, uninformative signature via singular value decomposition (SVD).

Similar articles

  • A multiplexed, three-dimensional pooling and next-generation sequencing strategy for creating barcoded mutant arrays: construction of a Schizosaccharomyces pombe transposon insertion library. Li Y, Molyneaux N, Zhang H, Zhou G, Kerr C, Adams MD, Berkner KL, Runge KW. Li Y, et al. Nucleic Acids Res. 2022 Sep 23;50(17):e102. doi: 10.1093/nar/gkac546. Nucleic Acids Res. 2022. PMID: 35766443 Free PMC article.
  • Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons. Wetmore KM, Price MN, Waters RJ, Lamson JS, He J, Hoover CA, Blow MJ, Bristow J, Butland G, Arkin AP, Deutschbauer A. Wetmore KM, et al. mBio. 2015 May 12;6(3):e00306-15. doi: 10.1128/mBio.00306-15. mBio. 2015. PMID: 25968644 Free PMC article.
  • Competitive genomic screens of barcoded yeast libraries. Smith AM, Durbic T, Oh J, Urbanus M, Proctor M, Heisler LE, Giaever G, Nislow C. Smith AM, et al. J Vis Exp. 2011 Aug 11;(54):2864. doi: 10.3791/2864. J Vis Exp. 2011. PMID: 21860376 Free PMC article.
  • Chemical-genetic approaches for exploring the mode of action of natural products. Lopez A, Parsons AB, Nislow C, Giaever G, Boone C. Lopez A, et al. Prog Drug Res. 2008;66:237, 239-71. doi: 10.1007/978-3-7643-8595-8_5. Prog Drug Res. 2008. PMID: 18416308 Review.
  • Nucleotide excision repair I: from E. coli to yeast. Hoeijmakers JH. Hoeijmakers JH. Trends Genet. 1993 May;9(5):173-7. doi: 10.1016/0168-9525(93)90164-d. Trends Genet. 1993. PMID: 8337754 Review.
  • Mitochondrial genome structure and composition in 70 fishes: a key resource for fisheries management in the South Atlantic. Alvarenga M, D'Elia AKP, Rocha G, Arantes CA, Henning F, de Vasconcelos ATR, Solé-Cava AM. Alvarenga M, et al. BMC Genomics. 2024 Feb 27;25(1):215. doi: 10.1186/s12864-024-10035-5. BMC Genomics. 2024. PMID: 38413941 Free PMC article.
  • Soft-metal(loid)s induce protein aggregation in Escherichia coli . Cornejo FA, Muñoz-Villagrán C, Luraschi RA, Sandoval-Díaz MP, Cancino CA, Pugin B, Morales EH, Piotrowski JS, Sandoval JM, Vásquez CC, Arenas FA. Cornejo FA, et al. Front Microbiol. 2023 Nov 22;14:1281058. doi: 10.3389/fmicb.2023.1281058. eCollection 2023. Front Microbiol. 2023. PMID: 38075883 Free PMC article.
  • Identification of triazenyl indoles as inhibitors of fungal fatty acid biosynthesis with broad-spectrum activity. Iyer KR, Li SC, Revie NM, Lou JW, Duncan D, Fallah S, Sanchez H, Skulska I, Ušaj MM, Safizadeh H, Larsen B, Wong C, Aman A, Kiyota T, Yoshimura M, Kimura H, Hirano H, Yoshida M, Osada H, Gingras AC, Andes DR, Shapiro RS, Robbins N, Mazhab-Jafari MT, Whitesell L, Yashiroda Y, Boone C, Cowen LE. Iyer KR, et al. Cell Chem Biol. 2023 Jul 20;30(7):795-810.e8. doi: 10.1016/j.chembiol.2023.06.005. Epub 2023 Jun 26. Cell Chem Biol. 2023. PMID: 37369212 Free PMC article.
  • BIONIC: biological network integration using convolutions. Forster DT, Li SC, Yashiroda Y, Yoshimura M, Li Z, Isuhuaylas LAV, Itto-Nakama K, Yamanaka D, Ohya Y, Osada H, Wang B, Bader GD, Boone C. Forster DT, et al. Nat Methods. 2022 Oct;19(10):1250-1261. doi: 10.1038/s41592-022-01616-x. Epub 2022 Oct 3. Nat Methods. 2022. PMID: 36192463 Free PMC article.
  • Genomic Approaches to Antifungal Drug Target Identification and Validation. Robbins N, Cowen LE. Robbins N, et al. Annu Rev Microbiol. 2022 Sep 8;76:369-388. doi: 10.1146/annurev-micro-041020-094524. Epub 2022 Jun 1. Annu Rev Microbiol. 2022. PMID: 35650665 Free PMC article. Review.
  • Giaever G et al. Genomic profiling of drug sensitivities via induced haploinsufficiency. Nat. Genet 21, 278–283 (1999). - PubMed
  • Parsons AB et al. Integration of chemical-genetic and genetic interaction data links bioactive compounds to cellular target pathways. Nat. Biotechnol 22, 62–69 (2004). - PubMed
  • Parsons AB et al. Exploring the mode-of-action of bioactive compounds by chemical-genetic profiling in yeast. Cell 126, 611–625 (2006). - PubMed
  • Pierce SE, Davis RW, Nislow C & Giaever G Genome-wide analysis of barcoded Saccharomyces cerevisiae gene-deletion mutants in pooled cultures. Nat. Protoc 2, 2958–2974 (2007). - PubMed
  • Costanzo M et al. The genetic landscape of a cell. Science 327, 425–431 (2010). - PMC - PubMed

Publication types

  • Search in MeSH

Related information

Grants and funding.

  • R01 GM104975/GM/NIGMS NIH HHS/United States
  • R01 HG005084/HG/NHGRI NIH HHS/United States
  • T32 GM008347/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full text sources.

  • Europe PubMed Central
  • Nature Publishing Group
  • PubMed Central

Molecular Biology Databases

  • Saccharomyces Genome Database

Research Materials

  • NCI CPTC Antibody Characterization Program

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Loading metrics

Open Access

Peer-reviewed

Research Article

ECD-CDGI: An efficient energy-constrained diffusion model for cancer driver gene identification

Roles Data curation, Methodology, Writing – original draft

Affiliation School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China

Roles Methodology, Supervision, Writing – review & editing

* E-mail: [email protected] (LZ); [email protected] (XF); [email protected] (QZ)

ORCID logo

Roles Investigation

Affiliation College of Computer Science and Electronic Engineering, Hunan University, Changsha, China

Roles Supervision, Writing – review & editing

Roles Supervision

Affiliation Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China

  • Tao Wang, 
  • Linlin Zhuo, 
  • Yifan Chen, 
  • Xiangzheng Fu, 
  • Xiangxiang Zeng, 

PLOS

  • Published: August 30, 2024
  • https://doi.org/10.1371/journal.pcbi.1012400
  • Reader Comments

This is an uncorrected proof.

Table 1

The identification of cancer driver genes (CDGs) poses challenges due to the intricate interdependencies among genes and the influence of measurement errors and noise. We propose a novel energy-constrained diffusion (ECD)-based model for identifying CDGs, termed ECD-CDGI. This model is the first to design an ECD-Attention encoder by combining the ECD technique with an attention mechanism. ECD-Attention encoder excels at generating robust gene representations that reveal the complex interdependencies among genes while reducing the impact of data noise. We concatenate topological embedding extracted from gene-gene networks through graph transformers to these gene representations. We conduct extensive experiments across three testing scenarios. Extensive experiments show that the ECD-CDGI model possesses the ability to not only be proficient in identifying known CDGs but also efficiently uncover unknown potential CDGs. Furthermore, compared to the GNN-based approach, the ECD-CDGI model exhibits fewer constraints by existing gene-gene networks, thereby enhancing its capability to identify CDGs. Additionally, ECD-CDGI is open-source and freely available. We have also launched the model as a complimentary online tool specifically crafted to expedite research efforts focused on CDGs identification.

Author summary

Cancer has become a major disease threatening human life and health. Cancer usually originates from abnormal gene activities, such as mutations and copy number variations. Mutations in cancer driver genes are crucial for the selective growth of tumor cells. Identifying cancer driver genes is crucial in cancer-related research and treatment strategies, as it helps understand cancer occurrence and development. However, the complex gene-gene interactions, measurement errors, and the prevalence of unlabeled data significantly complicate the identification of these driver genes. We developed a new method that integrates an energy-constrained diffusion mechanism with an attention mechanism to uncover implicit gene dependencies in biomolecular networks and generate robust gene representations. Extensive experiments demonstrated that our model accurately identifies known cancer driver genes and effectively discovers potential ones. Furthermore, we analyzed and predicted patient-specific mutated genes, enhancing our understanding of their pathogenesis and advancing precision medicine. In summary, our method offers a promising tool for advancing the identification of cancer driver genes.

Citation: Wang T, Zhuo L, Chen Y, Fu X, Zeng X, Zou Q (2024) ECD-CDGI: An efficient energy-constrained diffusion model for cancer driver gene identification. PLoS Comput Biol 20(8): e1012400. https://doi.org/10.1371/journal.pcbi.1012400

Editor: Jinyan Li, Chinese Academy of Sciences Shenzhen Institutes of Advanced Technology, CHINA

Received: October 25, 2023; Accepted: August 10, 2024; Published: August 30, 2024

Copyright: © 2024 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Our code and data are publicly available in the GitHub repository: https://github.com/taowang11/ECD-CDGI .

Funding: This work received partial support from the Natural Science Foundation of China under Grant No. 62302339, to L.Z. Additionally, this work was partially funded by the Natural Science Foundation of China under Grant No. 62372158 to X.F. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Cancer is typically driven by the accumulation of genetic variations, including single nucleotide variations, small insertions or deletions, and copy number variations [ 1 , 2 ]. Gene mutations can lead to activation or inactivation, promoting cancer occurrence and metastasis. Cancer driver genes(CDGs) mutations enable tumor cells to gain selective growth advantages in evading immune cell clearance and drug treatment [ 3 , 4 ]. Therefore, developing methods to identify CDGs is of great significance for cancer pathologic research, as well as the development of cancer diagnosis, treatment, and targeted drugs [ 5 ]. The recent advancements in next-generation sequencing technology have helped researchers facilitate the generation of a vast amount of cancer genomic data and classify somatic mutations in common and rare cancer types [ 6 ]. Systematically identifying CDGs from large-scale human cancer genomic data remains a significant challenge [ 7 , 8 ].

Many computational methods and tools have been developed to address this challenging issue in the past few years. Traditional computational methods for identifying CDGs can be divided into two main categories: mutation frequency-based and network-based. The mutation frequency-based methods generally assume that mutations in driver genes have a higher probability of being recurrent across samples compared to non-driver genes, thus identifying significantly mutated genes as CDGs [ 9 , 10 ]. The network-based methods consider cancer to result from mutations in multiple genes that collectively play essential roles in cancer-related biological pathways [ 11 , 12 ]. Despite the remarkable achievements of these methods in studying gene variations, there are still some limitations. For example, mutation frequency-based methods often fail to detect driver genes with low mutation frequencies due to the lack of reliable background mutation frequencies. Additionally, when biological networks lack numerous associative relationships or are inundated with a large amount of noise data, this type of method can lead to poor accuracy in identifying driver genes.

Recently, machine learning(ML) techniques, particularly deep learning methods, have achieved tremendous success in identifying CDGs [ 13 – 15 ]. ML-based approaches framethe prediction of driver genes as a classification task, leveraging available data and knowledge to identify driver genes or driver mutations. Typically, these methods utilize a low-dimensional representation of genes’ multi-omic feature vectors, subsequently employing classifiers to identify CDGs. For instance, Parvandeh et al. utilized cancer gene network data to calculate the differences between nodes using the Minkowski distance [ 16 ]. They integrated the nearest neighbor algorithm and evolutionary scoring calculation to potential CDGs. Similarly, Han et al. trained an ensemble of models on various types of gene mutations and then applied Poisson’s distribution coupled with Monte Carlo simulations to discover low-background mutation rate CDGs [ 17 ]. In another study, Habibi et al. combined mutation data, protein-protein interaction (PPI), and biological process networks. They calculated the score of gene features, engineered a gene-gene network significantly linked to cancer, and performed cluster analysis to study CDGs [ 18 ]. However, these traditional machine learning approaches face limitations due to their neglect of complex interactions inherent in gene-gene networks. GNNs offer a promising solution to this constraint. By employing an iterative message passing and aggregation mechanism, GNNs are capable of learning low-dimensional embeddings that capture the complex relationships among genes, based on their interactions within the network [ 19 ].

Consequently, GNNs have been instrumental in enhancing the accuracy of CDGs identification [ 20 – 22 ]. For example, the EMOGI model incorporates diverse multi-omics data, including copy number variation, methylation and PPI network to identify CDGs using graph convolutional neural networks (GCNs) [ 23 ]. The EMOGI model primarily focuses on a subset of genes in the PPI network, conducting training and evaluation solely at the node level. Building upon this, MTGCN integrates both CDG identification and interaction prediction tasks into a collaborative training framework, thereby improving the precision of CDG prediction [ 24 ]. These approaches utilize Chebyshev polynomials within the convolutional layers and separate the embeddings from their neighboring nodes during the aggregation process, which can effectively address the issue of "over-smoothing" often encountered with multiple iterative convolution operations. As a result, these models demonstrate superior performance compared to traditional GCNs [ 25 ] and Graph Attention Networks (GATs) [ 26 ]. However, these models do have their limitations. Specifically, biomolecular networks are typically highly heterogeneous, a condition primarily attributed to the diversity of genomic data, including gene expression, protein interactions, and metabolite profiles. To our knowledge, the message propagation in most GNN models is often influenced by nodes with high degrees. Consequently, this can lead to the masking or domination of gene features by heterogeneous, highly connected neighbors, which impedes the accurate representation of gene features. To overcome this limitation, Zhang et al. introduced the HGDC model based on graph diffusion models [ 27 ]. Initially, HGDC creates an auxiliary graph employing graph diffusion and random walk techniques and jointly trains it alongside the original graph to enhance node representation. Subsequently, it refines the propagation and aggregation mechanisms inherent in GCNs, making the model more suitable for heterogeneous biomolecular networks. Finally, it deploys a multi-layer attention classifier to accurately identify CDGs.

While existing models demonstrate strong performance in identifying CDGs, they have limitations. Most notably, these models often focus solely on the immediate neighborhood of nodes, overlooking potentially complex interdependencies between any two genes. Additionally, data noise introduced by errors in the collection process can further compromise performance. To address these challenges, we propose the ECD-CDGI model, which joins the diffusion process with an attention mechanism to unveil hidden relationships between any two genes and enhance CDG Identification. In summary, the main contributions of this paper are described as follows:

  • ECD-CDGI considers gene interactions as a diffusion process to maintain gene expression globally consistent in terms of the underlying structure while mitigating the effects of noisy data, and for the first time, realizes the combination of energy-constrained diffusion and attention mechanisms to identify CDGs.
  • We design an ECD-Attention encoder based on diffusion processes and attention mechanisms to capture implicit dependencies between genes in biomolecular networks. This approach generates robust gene representations, which are further enhanced by integrating topological information.
  • We introduce a hierarchical attention module to aggregate the output results across each layer during the information propagation process. By augmenting the diversity of node representations, this strategy subsequently improves the predictive accuracy of the ECD-CDGI model.
  • Extensive experiments indicate that the ECD-CDGI model possesses the ability to not only identify known CDGs but also efficiently uncover potential cancer genes. Moreover, compared to the GNN-based approach, the ECD-CDGI model exhibits lower constraints from gene-gene networks, which enhances its ability to identify potential cancer genes.

Materials and methods

The task of identifying CDGs generally draws upon multi-omics data sources including genomics, transcriptomics, proteomics, and metabolomics. The primary workflow entails applying dimensionality reduction techniques to these multi-omics datasets, effectively extracting the low-dimensional representations of genes in the biomolecular network in a reduced dimensional space. Subsequently, the representations of these genes are compared to the representations of known CDGs, enabling the prediction of CDGs. For the scope of this experiment, we utilize a gene set within a 58-dimensional feature space, as cited in the referenced work [ 27 ].

The efficacy of the proposed ECD-CDGI model in predicting CDGs was evaluates across three distinct biomolecular network datasets: PathNet [ 28 ], GGNet [ 29 ], and PPNet [ 30 ]. Specifically, the PathNet dataset comprises a network of interlinked biochemical pathways within cells or organisms, incorporating data from both KEGG and Reactome pathways. GGNet is constructed from RNA interaction data, forming a gene-gene network. Meanwhile, PPNet is extracted from the STRING database. Each of these datasets offers a unique perspective, contributing to a comprehensive evaluation of the model’s performance.

In this study, the term "cancer driver genes" refers to genes that are clearly identified and widely recognized for their crucial roles in the initiation and progression of tumors. These genes are categorized as positive samples. Specifically, 711 well-established driver genes were sourced from the NCG database [ 31 ], and an additional 85 high-confidence driver genes were identified using the DigSEE tool [ 32 ], totaling 796 genes. The positive samples across PPNet, GGNet, and PathNet networks, are derived from these genes. Additionally, drawing on prior findings [ 23 ], negative samples were selected based on the following criteria: Exclude genes 1) listed in the NCG database [ 31 ], 2) linked to "cancer pathways" from the KEGG database [ 33 ], 3) listed in the OMIM disease database [ 34 ], 4) predicted by MutSigdb [ 9 ] to be cancer-related, 5) with expression patterns similar to known cancer genes. Generally, negative samples comprise genes that are unlikely to be related to cancer. The data used in this study is presented in Table 1 .

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pcbi.1012400.t001

Problem formulation

The proposed ECD-CDGI model leverages an encoder grounded in both energy-constrained diffusion processes and attention mechanisms. To facilitate a comprehensive understanding of this model and its architecture, we will delineate the foundational principles and associated technologies underpinning the model in the section.

Energy-constrained diffusion process

genetic interactions experiments

In this way, the diffusivity serves as a measure of the influence between any two nodes and can also be interpreted as attention of each node-node pair. This insight informs the architecture of encoders built on energy-constrained diffusion processes and attention mechanisms.

Model architecture

Fig 1 illustrates the architecture of the ECD-CDGI model, comprising primarily three modules: the Data Module, the Encoder Module (including ECD-Attention encoder, GNN encoder and Residual connection), and the Multi-layer Attention Module. To enrich the datasets, both the initial feature vectors of gene nodes in the biomolecular network and the network’s topological structure were extracted, as detailed in the materials section. To address the challenges posed by noisy observational data and latent dependencies among nodes within biomolecular networks, we design a novel encoder, termed ECD-Attention. This encoder is ground in energy-constrained diffusion processes and attention mechanisms. Fig 1(D) illustrates the energy-constrained diffusion process, wherein the energy (information) from each node is distributed to all other nodes in the network, ensuring that the state of each node is influenced by that of every other node. Simultaneously, a GNN encoder is used to mine the topological structure of the biomolecular network, thereby augmenting gene representations. Employing a multi-layer attention mechanism, the proposed model assimilates information across multiple scales to efficiently identify CDGs.

The ECD-CDGI model employs a automatic approach to identify CDGs, including several key stages: Initially, the multi-omics data information of genes within the biomolecular network is fed into the ECD-Attention encoder, while concurrently, the topological information is input into the GNN encoder. The features extracted from both encoders are then concatenated, followed by residual connections and layer normalization operations. Subsequently, leveraging the message propagation mechanism, the encoding process undergoes multiple iterations, generating multiple sets of gene representations. Ultimately, the multi-layered data is fused utilizing the hierarchical attention module, resulting in the final node representations. These comprehensive representations are then employed to predict CDGs.

thumbnail

The architecture of the ECD-CDGI model mainly includes three principal modules: (A) Data Module, (B) Encoder Module, and (C) Multi-layer Attention Module. (A) The Data Module primarily contains the initial feature vectors and topological architecture of gene nodes within the biomolecular network. (B) The Encoder Module is consisting of three key components: a newly-conceived ECD-Attention encoder based on energy-constrained diffusion process (D) , a GNN encoder, and a residual connection. (C) Employing a hierarchical structure, the Multi-layer Attention Module integrates data across various layers to formulate a comprehensive node representation, which is then used to identify CDGs effectively. (D) The energy-constrained diffusion process.

https://doi.org/10.1371/journal.pcbi.1012400.g001

ECD-Attention encoder.

Building on the insights gained from the Preliminary Section, the diffusion process is governed by energy constraints, which aim to reduce the overall system energy during diffusion, thereby stabilizing the system. And inspired by previous work [ 38 ], we introduce an ECD-Attention encoder that incorporates both energy-constrained diffusion and attention mechanisms. This encoder is crafted to ensure the local consistency of each gene node’s current state during the information propagation process that is similar to the diffusion process, while also preserving global consistency with other gene nodes in the biomolecular network. Notably, the encoder effectively dampens the impact of data noise and reveal latent interdependencies between genes. The following is a detailed presentation of the relevant principles and steps.

genetic interactions experiments

Leveraging the energy-constrained diffusion and attention mechanisms, the diffusivity matrix in the diffusion process can be reinterpreted as an attention matrix for gene-gene pairs. Echoing the principles outlined in the Preliminary Section, a straightforward dot-product method is employed to quantify the similarity between any two genes. Furthermore, within the energy-constrained diffusion process, the node state update rule considers the state of all nodes, meaning each node’s state is influenced by every other node. Node state updates are executed by integrating the complete node-node similarity matrix with the value vector. Clearly, this approach is well-suited for the Transformer architecture. In the Transformer architecture, node-node attentions resemble the signal propagation rate S observed in energy-constrained diffusion processes. This process normalizes the similarity between nodes using dot product and sigmoid operations.

genetic interactions experiments

GNN encoder

genetic interactions experiments

Residual connection

genetic interactions experiments

Multi-layer attention

genetic interactions experiments

To evaluate the efficacy of the ECD-CDGI model, we execute multiple sets of experiments using publicly available datasets. Initially, we engage in comparative analyses against state-of-the-art methods for CDG identification to validate the model’s superior capabilities. Subsequently, we design a series of ablation experiments to evaluate the individual contributions of various modules within the ECD-CDGI architecture. In the final phase, we delve into specific case studies and explore the scalability prospects of our proposed model.

Implementation detail

This study was conducted using the Python and Pytorch frameworks, focusing on parameters associated with the ECD-Attention encoder, GCN encoder, and multi-layer attention module, along with various hyperparameters. Genomic data served as the initial input for the model, with its dimensionality set at 58. In the ECD-Attention encoder, the transformation weight matrices are preset to a dimension of 100. The multi-layer attention module is configured with four layers by default, with each layer’s initial weight preset at 0.5. Both the ECD-Attention and GCN encoders are integrated across 4 layers. Other hyperparameters include a hidden layer dimension of 100, 100 training rounds, a default learning rate of 0.001, and Adam as the optimizer.

Comparison experiment

We designed a series of benchmarking experiments across three publicly accessible datasets GGNet, PathNet, and PPNet, to compare the performance of our ECD-CDGI model with six other methods. These comprise three advanced CDG prediction models EMOGI [ 23 ], MTGCN [ 24 ], and HGDC [ 27 ], as well as three conventional GNN models GCN [ 25 ], GAT [ 26 ], and ChebNet [ 40 ]. To ensure a level playing field, each method was fed the same feature matrix corresponding to biomolecular networks. We carried out ten times of 5-fold cross-validation for each model. The final performance metrics, represented by the average AUC and AUPR scores, are presented in Table 2 .

thumbnail

https://doi.org/10.1371/journal.pcbi.1012400.t002

As reflected in Table 2 , EMOGI, MTGCN, HGDC, ChebNet, and our proposed ECD-CDGI model all demonstrated commendable performance in the task of identifying CDGs. The GCN and GAT models lagged behind in terms of effectiveness. Notably, the EMOGI, MTGCN, HGDC, and ChebNet algorithms all employ Chebyshev polynomials to perform convolution operations. During the message propagation and aggregation phases, these models differentiate between neighboring nodes and the nodes themselves, thereby mitigating the performance degradation typically induced by over-smoothing. Building upon this, the HGDC model incorporates an auxiliary network crafted using graph diffusion technology and aims to enhance predictive accuracy through joint training with the original network. However, it’s noteworthy that HGDC’s performance remains on par with, or even slightly underperforms, the original ChebNet model. This suggests that the auxiliary network generated through graph diffusion techniques may introduce an element of unpredictable noise.

It’s important to highlight that our proposed ECD-CDGI model outperformed all competitors across all datasets. It led the second-best performing model by margins of 1.30%, 1.24%, and 2.13% in the AUC index, and by 1.57%, 2.02%, and 2.76% in the AUPR index. These results underscore the efficacy of the ECD-Attention encoder, which is grounded in energy-constrained diffusion and attention mechanisms. This encoder is adept at unveiling the complex interdependencies among genes. When combined with the GCN encoder to harness the topological information of the gene-gene network, it substantially enhances the quality of node representation. As illustrated in Fig 2 , we plotted the ROC and PR curves for each model on three datasets. The curves for ECD-CDGI model consistently outpace other models and demonstrate remarkable stability. This provides additional validation that the ECD-CDGI model is both efficient and reliable in identifying CDGs.

thumbnail

ROC curves for multiple models on (a) PPNet, (b) PathNet, and (c) GGNet datasets; PR curves for (d) PPNet, (e) PathNet, and (f) GGNet datasets.

https://doi.org/10.1371/journal.pcbi.1012400.g002

Ablation experiment

This section aimed to evaluate the individual contributions of four key modules within the ECD-CDGI model: the ECD-Attention encoder, the GCN encoder, the residual connection, and the multi-layer attention mechanism. To facilitate this, we conduct ablation experiments across three datasets GGNet, PathNet, and PPNet, while holding other variables constant. The term ’w/o ECD-Att’ denotes a model configuration that removes the ECD-Attention encoder, relying solely on the GCN encoder. Conversely, ’w/o GCN’ signifies a setup where the GCN encoder is excluded, with only the ECD-Attention encoder in place. And ’w/o Residual’ means that the residual connection module has been removed, while ’w/o multi-Att’ implies that the model delete the multi-layer attention mechanism and employs only the encoder’s final layer output for both training and prediction.

We performed ten times of 5-fold cross-validation experiments for each model configuration across three datasets. The results are summarized as average values for the AUC and AUPR metrics, as detailed in Table 3 . Generally speaking, any version of the ECD-CDGI model that omits one of its key components, whether it’s the ECD-Attention encoder, GCN encoder, residual connection, or multi-layer attention mechanism, experiences a decline in performance. The ECD-Attention encoder captures global information, revealing potential dependencies between indirectly connected genes. The GCN encoder receives information from neighboring nodes and effectively propagates messages based on gene interactions. Residual connections maximize the retention of original features during iterations, preventing the loss of information from nodes in previous layers. The multi-layer attention mechanism automatically learns weights and integrates node representations across weighted iterations, enhancing model performance.

thumbnail

https://doi.org/10.1371/journal.pcbi.1012400.t003

Diving into details, the model’s performance declines slightly on the GGNet dataset when the GCN encoder is omitted, whereas a more substantial decrease is observed on both the PathNet and PPNet datasets. Intriguingly, this pattern is reversed when the ECD-Attention encoder is omitted. This suggests that the high heterogeneity and complex topological structure of the GGNet dataset may make it difficult for GCNs to effectively capture the intricate relationships and dependencies within the data. The finding also highlights the ECD-Attention encoder’s ability to uncover latent interdependencies among genes, thus boosting the model’s overall performance. Most notably, the model experiences its poorest performance when the Residual module is omitted, indicating its critical role in mitigating the over-smoothing arising during information propagation. It is noteworthy that the Residual module serves as a pivotal element within the ECD-Attention encoder, supplying essential information about the node’s current state during the energy-constrained diffusion process.

Skewed distribution and enrichment analysis

We conducted extensive experiments and analyses across the GGNet, PPNet, and PathNet datasets to evaluate the capability of our proposed ECD-CDGI model to identify previously unknown CDGs. To mitigate the influence of random variables, we ran the ECD-CDGI model through 100 iterations on each of these datasets, thereafter analyzing the predicted gene scores.

As illustrated in Fig 3 , the gene scores predicted by the ECD-CDGI model across all datasets exhibit a positive skewness. A scant number of genes gain conspicuously high scores, deviating from the central cluster of the data, while the majority of gene scores hover between -2 and 0. This is likely attributable to the fact that the overwhelming majority of genes are not CDGs, resulting in only subtle variations in their scores. In contrast, the outliers in the dataset suggest a small subset of genes with markedly higher scores, pointing to a heightened likelihood of them being CDGs. Overall, the ECD-CDGI model demonstrates a robust ability to differentiate these CDGs from other non-CDGs.

thumbnail

https://doi.org/10.1371/journal.pcbi.1012400.g003

We selected and merged the top 100 genes with the highest scores from three networks, resulting in a total of 178 unique genes. This was done to assess the ECD-CDGI model’s ability to recognize these genes. With reference to the DisGeNET database [ 41 ], these highly scored genes were further enriched. In Fig 4(A) each bar on the left represents a different cancer category; the length of the bar indicates the statistical significance of the gene set linked to that disease. A higher -log10(P) value correlates with a lower p-value, suggesting a stronger association between the gene set and the disease. These results suggest that these high-scoring genes are significantly associated with various diseases, predominantly cancers, particularly pancreatic tumors. To further investigate these genes, we conducted pathway and process enrichment analyses using KEGG pathways, GO biological processes, and other resources, categorizing the genes into clusters based on similarities. In Fig 4(B) , on the right, genes are depicted as nodes in different colors, each color representing a distinct enriched pathway. The size of each node correlates with the level of gene enrichment in the corresponding pathway. Purple lines between nodes indicate interactions among genes or the biological processes in which they participate. Of these, 44 genes (24.72%) showed significant enrichment in the "Cancer Pathway" (KEGG Pathway). These genes are likely pivotal in the genesis and progression of tumors. This underscores the capacity of the ECD-CDGI model to identify CDGs accurately, thereby aiding in the elucidation of cancer initiation and progression mechanisms as well as informing relevant treatment strategies.

thumbnail

(a) Results of gene enrichment analysis for various cancers using the ECD-CDGI model; (b) Enrichment analysis leveraging KEGG pathways and GO biological processes.

https://doi.org/10.1371/journal.pcbi.1012400.g004

Identifying new cancer genes

To validate the efficacy of the ECD-CDGI model in identifying novel cancer genes, we conducted targeted experiments. Specifically, we computed the average prediction probabilities for four categories of genes: known CDGs, non-CDGs, a set of potential cancer genes from the ncg7.1 database, and other genes across the GGNet, PathNet, and PPNet datasets. The results detailed in Fig 5 reveal that known CDGs garnered the highest average predicted probabilities, while non-CDGs received the lowest. This underscores the ECD-CDGI model’s capability to accurately differentiate between CDGs and non-CDGs. Intriguingly, the average predicted probability for potential cancer genes was also markedly higher than that for non-CDGs and other genes. This suggests that the ECD-CDGI model is not only proficient in identifying known CDGs but is also adept at uncovering potential cancer genes.

thumbnail

https://doi.org/10.1371/journal.pcbi.1012400.g005

Case analysis

We undertook a comprehensive comparative analysis to evaluate the adaptability of the ECD-CDGI model across diverse datasets. Specifically, we selected the top 50 genes with predictive scores from the GGNet, PPNet, and PathNet datasets, and then quantified the number and percentage of CDGs involved. These findings are visually represented in Fig 6(A) through a Venn diagram. Interestingly, the likelihood of identifying a CDG that is unique to a single dataset is notably lower than discovering one that appears across multiple datasets. This observation indicates that genes scoring highly across various datasets are more likely to be CDGs. It’s important to acknowledge that due to inherent constraints in each dataset, such as the presence of noisy data, the complexity of multi-omics data, and variations in gene topological networks, predictive inaccuracies may occur within the ECD-CDGI model. To mitigate these limitations, a cross-dataset analysis can be performed to enhance the precision in identifying CDGs.

thumbnail

(a)Venn diagram illustrating the quantity and proportion of CDGs identified by ECD-CDGI model across three datasets. (b)Pie chart showing the proportion of known CDGs, cancer-related genes, and other genes identified as CDGs by the ECD-CDGI model on three datasets.

https://doi.org/10.1371/journal.pcbi.1012400.g006

Additionally, we delved into the analysis of CDGs that were consistently identified across all three datasets. As depicted in Fig 6(B) , out of the 26 genes analyzed, 19 were classified as CDGs, making up 73.08% of the total. Three genes, although not defined as CDGs, were listed as cancer-related in the ncg7.1 database, and constituted 11.54% of the sample. Four other genes TTN, PCLO, LRP2, and RYR2, accounted for the remaining 15.38%. While these genes are not cataloged in the ncg7.1 database, existing literature [ 42 – 44 ] suggests their significant relevance to cancer.

To investigate patient-specific CDGs, we gathered and assessed patient-specific data using the ECD-CDGI model. Mutant genes with higher prediction scores are more likely to be specific driver genes, potentially accelerating cancer progression. Specifically, we utilized the Xena tool [ 45 ] to collect somatic mutation data from 5776 patients across 14 cancer types in the TCGA database [ 45 ]. Initially, we screened and retained genes present in the GGNet, PathNet, and PPNet networks from the patients’ mutant gene data. Building on this, we selected 5535 patients with five or more mutant genes for further analysis. We quantified the mutant genes of each patient (see Fig 7 ) and observed that some patients had fewer than five cancer driver genes, with 2.40% of patients lacking any cancer driver genes in their mutations. Prior studies suggest that having five or more cancer driver genes may correlate with individual cancer development [ 46 ]. Therefore, identifying patients’ specific CDGs is crucial for targeted treatment.

thumbnail

https://doi.org/10.1371/journal.pcbi.1012400.g007

In this study, we assessed the ECD-CDGI model’s efficacy in identifying patient-specific CDGs for mutant genes, alongside relevant analyses. Specifically, the model was trained using omics data from 14 cancer types on three biomolecular networks: GGNet, PathNet, and PPNet. For each type of cancer, the model generated three predictive gene ranking lists. For each patient, the Rank algorithm [ 47 ] was employed to merge the three gene rankings into a consolidated final list. Subsequently, the top five mutant genes from the final ranking were selected as the specific CDGs for each patient. As illustrated in Fig 8 , within the PPNet network, the shortest distances between the identified driver genes were notably shorter than those between the mutant genes prior to screening. This suggests that the identified CDGs are closely interconnected, likely cooperating within shared biological pathways or functional modules. This tight linkage intensifies their impact on tumor formation, potentially accelerating tumor progression and malignancy.

thumbnail

https://doi.org/10.1371/journal.pcbi.1012400.g008

In subsequent analyses, we focused on the top 500 genes with the highest prediction scores across the GGNet, PPNet, and PathNet datasets. After removing well-established CDGs, we consider the remaining genes as potential cancer genes. We then probed whether a relationship exists between these potential cancer genes identified by the ECD-CDGI and their connectivity to known CDGs.

As illustrated in Fig 9(A) and 9 (B) , for the PPNet and PathNet datasets, the Spearman correlation coefficients are both below 0.1, and the p-values significantly exceed the 5% significance threshold. This indicates only a marginal correlation. Fig 9(C) reveals that in the GGNet dataset, the Spearman correlation coefficient is 0.17, with a p-value of 0.0238, falling below the 0.05 threshold, signifying a slight but statistically significant positive correlation between the two variables. These results suggest that the potential cancer genes identified by the ECD-CDGI model exhibit a lower degree of reliance on known CDGs. Importantly, this implies that the ECD-CDGI model is less constrained by existing gene-gene networks in identifying potential cancer genes. As a result, it is better suited for the discovery of novel cancer genes, a task that proves challenging for methods based on GNNs.

thumbnail

https://doi.org/10.1371/journal.pcbi.1012400.g009

Discussion and Conclusion

This study investigates the pivotal importance of identifying CDGs for both cancer research and clinical treatment, and evaluates various methodologies geared towards this purpose. While existing machine learning and deep learning techniques are indeed effective, they come with inherent limitations. Most notably, these methods often overlook the complex interdependencies between any two genes and may be compromised by noisy data, a byproduct of data collection oversights.

To address these shortcomings, we introduce the ECD-CDGI model, which incorporates a energy-constrained diffusion process and an attention mechanism. By combining with GNNs and multi-layer attention techniques, our model offers a robust tool for identifying CDGs. Our specially designed ECD-Attention encoder not only uncovers the complex global interrelationships between any two genes but also captures nuanced local information to individual gene nodes. Additionally, we integrate residual connections within the model’s layers to mitigate the performance degradation caused by over-smoothing during inter-layer information propagation. Employing GNN technology, the ECD-CDGI model is capable of extracting topological information from gene-gene networks and leverages a multi-layer attention mechanism for predicting CDGs. Comparison and ablation experiments conducted on public datasets confirm the model’s superior performance. We anticipate that the ECD-CDGI model will assume a significant role in cancer research and treatment protocols, offering researchers an efficient tool for understanding the mechanism of cancer development.

Despite its efficacy in CDG prediction, the ECD-CDGI model has certain limitations. Firstly, the presence of missing or erroneous links in biomolecular networks can compromise the model’s performance. Excessive errors or missing links can mislead the learning process and diminish the model’s accuracy. Secondly, while graph neural networks utilize the topological information in biomolecular networks effectively, the absence of comprehensive omics data still impacts their performance. In practical applications, critical omics data, including gene expression, protein interactions, and metabolite profiles, are often incomplete or unavailable. This lack of data can prevent the model from fully understanding gene network interactions, potentially misleading its learning process. Additionally, integrating and synergizing various types of omics data presents challenges due to differing data characteristics and noise levels, where improper handling could impair the model’s performance. To address these issues, future work will focus on mitigating the identified problems. Firstly, we plan to employ debiasing and sampling techniques to minimize the effects of erroneous or incomplete data. Additionally, we will explore multi-omics fusion techniques to fully leverage diverse datasets. Concurrently, we will assess imputation methods to further diminish the impact of data gaps in omics datasets.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 35. Rosenberg S. (1997). The Laplacian on a Riemannian manifold : an introduction to analysis on manifolds . Cambridge University Press.

Inferred from Genetic Interaction (IGI)

  • 2.1 Genetic interactions such as suppression, enhancement, synergistic (synthetic) interactions, etc.
  • 2.2 Co-transfection experiments
  • 2.3 Expression of one gene affects the phenotype of a mutation in another gene
  • 3 Use of the With/From Field for IGI
  • 4 When IGI Should NOT be Used
  • 5 Quality Control Checks
  • 6 Evidence and Conclusion Ontology
  • 8 Review Status

The IGI evidence code is used for annotations based on experiments reporting the effects of perturbations of more than one gene product. Examples of these experiments include:

  • Genetic interactions involving two or more mutations that result in suppression or enhancement of a given phenotype, also synergistic (synthetic) interactions
  • Co-transfection experiments in which two or more genes are expressed in a heterologous system to assess functional interaction
  • Expression of one gene alters the phenotypic outcome of a mutation in another gene; the two genes may or may not be from the same species. In the literature, these types of experiments are variably referred to as: functional complementation, rescue experiments, or suppression

Key to deciding whether to use the IGI or IMP (Inferred from Mutant Phenotype) evidence code is consideration of the point of reference (i.e., what is being compared) to determine a possible interaction. If experiments interrogate the effects of multiple mutations or differences from the control, then use IGI . If experiments interrogate the effects of a single mutation or difference from the control, then use IMP (Inferred from Mutant Phenotype) .

Examples of IGI Usage

Genetic interactions such as suppression, enhancement, synergistic (synthetic) interactions, etc..

This use of the IGI evidence code refers to the more “traditional” genetic interaction experiments performed in model organisms as well as more recent approaches such as RNA-mediated knockdown or genome editing techniques. Note that genetic interaction experiments may be performed with both loss-and gain-of-function mutations. Consequently, curators will need to use their expertise to determine whether interaction phenotypes resulting from gain-of-function mutations are informative about the normal, wild type role of a gene or gene product.

  • Localized cell wall degradation is essential for proper cell fusion in the fission yeast, Schizosaccharomyces pombe. This process is accomplished by the localized action of degradative enzymes including several distinct glucanases that act on different polysaccharides. Deletion of multiple glucanases in S. pombe results in decreasing efficiency of cell fusion indicating thateach enzyme contributes additively to this process.
DB Object ID DB Object Symbol GO ID DB:Reference Evidence Code With (or) From
PomBase:SPBC2D10.05 exg3 GO:1904541 (fungal-type cell wall disassembly involved in conjugation with cellular fusion) PMID:25825517 IGI PomBase:SPBC646.06c (agn2)
PomBase:SPBC646.06c agn2 GO:1904541 (fungal-type cell wall disassembly involved in conjugation with cellular fusion) PMID:25825517 IGI PomBase:SPBC2D10.05 (exg3)
  • The response to axonal injury requires the activities of MAP kinase and cAMP signaling pathways that are required, for example, for signaling growth cone formation. In C. elegans , the activity of the upstream-most kinase in one of the MAPK signaling pathways, DLK-1, is stimulated by Ca2+ influx mediated by the EGL-19 voltage-gated calcium channel. EGL-19’s regulatory role in the MAPK-mediated axon regeneration pathway was determined, in part, through doubly mutant animals containing an egl-19 hypermorphic mutation that results in occasional action potentials with significantly prolonged plateau phases and a dlk-1 loss-of-function mutation that showed a reduced axon regenerative response when compared to egl-19 alone. Note that in this example, reciprocal IGI annotations are not made, as the GO term selected for EGL-19 does not make sense for DLK-1.
DB Object ID DB Object Symbol GO ID DB:Reference Evidence Code With (or) From
WB:WBGene00001187 egl-19 GO:1904922 (positive regulation of MAPK cascade involved in axon regeneration) PMID:20203177 IGI WB:WBGene00001008
  • Example 3: Synergistic (synthetic) interactions
  • Disruption of the MSB2 gene in S. cerevisiae has no appreciable effects on the cell's ability to activate the High-Osmolarity Glycerol (HOG) pathway upon osmotic stress, or on cellular growth on high-osmolarity media. To identify potential osmosensors in the SHO1 branch of the HOG pathway, the authors screened for a mutant that is osmosensitive only in an msb2Δ background and recovered mutations in the HKR1 gene. Like MSB2, mutations in HRK1 alone confer no osmosensitivity to the cells.
DB Object ID DB Object Symbol GO ID DB:Reference Evidence Code With (or) From
SGD:S000003246 MSB2 GO:0006972 (hyperosmotic response) PMID:17627274 IGI SGD:S000002828 (HKR1)
SGD:S000002828 HKR1 GO:0006972 (hyperosmotic response) PMID:17627274 IGI SGD:S000003246 (MSB2)

Co-transfection experiments

  • Co-transfection experiments include those experiments where two or more gene products are expressed in a heterologous system, such as a cell line, for the purposes of interrogating a functional interaction between them.
  • Example 1: Co-transfection of G protein-coupled receptors (GPCRs)
  • In C. elegans, the response to dauer pheromone, a mixture of small molecules, is mediated by G protein-coupled receptors (GPCRs). Genetic analysis has implicated two GPCRs, SRBC-64 and SRBC-66, in a signaling pathway that responds to specific components of dauer pheromone. To assess the biochemical role of SRBC-64 and SRBC-66, the gene products were expressed singly or in combination in HEK293 cells. Only when expressed in combination were the GPCRs able to enhance forskolin-stimulated cAMP production.
DB Object ID DB Object Symbol GO ID DB:Reference Evidence Code With (or) From
WB:WBGene00021477 SRBC-64 GO:0007186 (G-protein coupled receptor signaling pathway) PMID:19797623 IGI WB:WBGene00020746 (SRBC-66)
WB:WBGene00020746 SRBC-66 GO:0007186 (G-protein coupled receptor signaling pathway) PMID:19797623 IGI WB:WBGene00021477 (SRBC-64)

Expression of one gene affects the phenotype of a mutation in another gene

  • These types of experiments are described in various ways in the published literature, but generally involve expressing a wild-type copy of one gene in the background of a mutation in a second, different gene to determine if the expressed gene can mask the phenotype of the mutated gene. The two genes may or may not be from the same species. When genes from different species are analyzed it is often with the intent of demonstrating functional conservation between species.
  • Example 1: Genes from different species
  • C. elegans contains two genes, lgg-1 and lgg-2, with sequence similarity to the Saccharomyces cerevisiae ubiquitin-like protein Atg8 that is required for autophagosome biogenesis. Transformation of lgg-1, but not lgg-2, into atg8 deletion mutants in nitrogen starvation medium results in increased survival compared to atg8 mutants alone, indicating that lgg-1 can functionally complement budding yeast atg8.
DB Object ID DB Object Symbol GO ID DB:Reference Evidence Code With (or) From
WB:WBGene00002980 lgg-1 GO:0016236 (macroautophagy) PMID:20523114 IGI SGD:S000000174 (atg8)
  • For these annotations, the With/From column should list the identifier for the endogenous gene that is complemented by the heterologously expressed gene being annotated. In annotations from cross-species functional complementation experiments, the gene referred to in the With/From column will thus be from a different species than the gene being annotated.
  • Example 2: Different genes from the same species
  • The planar cell polarity pathway is critical for a number of biological processes including epidermal wound repair. Activity of the GRHL3 transcription factor is essential for efficient wound repair in mice and human cell lines. Wound repair requires activation of the RhoA small GTPase to effect the cellular polarization, actin polymerization and epidermal migration critical to wound closure. The gene encoding the RhoGEF RhoGEF119, a RhoA GTPase activator, is a transcriptional target of GRHL3, and RHOGEF119 activity is also required for wound repair. Expression of human RhoGEF119 in human Grhl3-kd cell lines rescues the actin polymerization defects resulting from loss of Grhl13, indicating a role for RhoGEF119 in regulation of actin cytoskeletal organization during wound repair.
DB Object ID DB Object Symbol GO ID DB:Reference Evidence Code With (or) From
UniProtKB:Q8IW93 ARHGEF19 GO:0032956 (positive regulation of actin cytoskeleton organization) PMID:20643356 IGI UniProtKB:Q8TE85 (GRHL3)
UniProtKB:Q8TE85 GRHL3 GO:0032956 (positive regulation of actin cytoskeleton organization) PMID:20643356 IGI UniProtKB:Q8IW93 (ARHGEF19)
  • Note that rescue experiments may be used to help determine the order in which gene products act within a biological pathway or process.
  • Example 3: Different genes from the same species
  • Localized assembly of a filamentous actin (F-actin) network at the leading edge of D. discoideum cells is required for proper chemotaxis towards the cAMP chemoattractant. The organization of actin filaments is regulated by intracellular pH; an increase in pH is necessary for chemotaxis and required the Na+/H+ exchanger Ddnhe1. Expression of DdAip1, the D. discoideum ortholog of Actin-interacting protein 1, suppresses the chemotaxis defect of Ddnhe1 mutants by restoring the F-actin network, thus illustrating DdAip1's role in actin filament polymerization.
DB Object ID DB Object Symbol GO ID DB:Reference Evidence Code With (or) From
DDB:G0278733 aip1 GO:0030041 (actin filament polymerization) PMID:20668166 IGI DDB:G0275711 (nhe1)

Use of the With/From Field for IGI

  • The IGI evidence code requires curators to enter a stable database identifier for the interacting entity in the With/From field of the Gene Association File (GAF).
  • Independent interactors may be captured in the With/From field by separating each entry with a pipe.
  • If the interaction experiment involves multiple perturbations simultaneously, e.g. triply mutant strains, then the respective interactors are separated with a comma.
  • Protein-containing complexes

When IGI Should NOT be Used

  • GSK3B negative regulation of protein localization to nucleus (GO:1900181) PMID:23624080 transports_or_maintains_localization_of GATA6
  • inpp-1 response to odorant (GO:1990834) IMP
  • Human miR-133a mRNA binding involved in posttranscriptional gene silencing (GO:1903231) PMID:24920580 IDA has_direct_input SNAI1
  • Human miR-133a gene silencing by miRNA (GO:0035195) PMID:24920580 IDA regulates_expression_of SNAI1

Quality Control Checks

Evidence and conclusion ontology.

ECO:0000316 genetic interaction evidence used in manual assertion

Annotating from Phenotypes

Curator Guide to GO Evidence Codes

Gene Ontology website GO Evidence Codes list

Review Status

Last reviewed: February 23, 2018

  • Evidence Codes

Navigation menu

  • Biology & Environment
  • Clean Energy
  • Fusion & Fission
  • Physical Sciences
  • National Security
  • Neutron Science
  • Supercomputing
  • User Facilities
  • Educational Programs
  • Procurement
  • Small Business Programs
  • Leadership Team
  • Initiatives
  • Visiting ORNL
  • Fact Sheets
  • Virtual Tour

The ALICE experiment: a journey through QCD

The ALICE experiment was proposed in 1993, to study strongly-interacting matter at extreme energy densities and temperatures. This proposal entailed a comprehensive investigation of nuclear collisions at the LHC. Its physics programme initially focused on the determination of the properties of the quark–gluon plasma (QGP), a deconfined state of quarks and gluons, created in such collisions. The ALICE physics programme has been extended to cover a broader ensemble of observables related to Quantum Chromodynamics (QCD), the theory of strong interactions. The experiment has studied Pb–Pb, Xe–Xe, p–Pb and pp collisions in the multi-TeV centre of mass energy range, during the Run 1–2 data-taking periods at the LHC (2009–2018). The aim of this review is to summarise the key ALICE physics results in this endeavor, and to discuss their implications on the current understanding of the macroscopic and microscopic properties of strongly-interacting matter at the highest temperatures reached in the laboratory. It will review the latest findings on the properties of the QGP created by heavy-ion collisions at LHC energies, and describe the surprising QGP-like effects in pp and p–Pb collisions. Measurements of few-body QCD interactions, and their impact in unraveling the structure of hadrons and hadronic interactions, will be discussed. ALICE results relevant for physics topics outside the realm of QCD will also be touched upon. Finally, prospects for future measurements with the ALICE detector in the context of its planned upgrades will also be briefly described.

Researchers

Kenneth F Read Jr

Organizations

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Europe PMC Author Manuscripts

Top 10 Replicated Findings from Behavioral Genetics

Robert plomin.

King’s College London

John C. DeFries

University of Colorado

Valerie S. Knopik

Rhode Island Hospital and Brown University

Jenae M. Neiderhiser

The Pennsylvania State University

In the context of current concerns about replication in psychological science, we describe 10 findings from behavioral genetic research that have robustly replicated. These are ‘big’ findings, both in terms of effect size and potential impact on psychological science, such as linearly increasing heritability of intelligence from infancy (20%) through adulthood (60%). Four of our top-10 findings involve the environment, discoveries that could only have been found using genetically sensitive research designs. We also consider reasons specific to behavioral genetics that might explain why these findings replicate.

Introduction

A recent concern in psychological science is that many statistically significant findings, including some classic findings, do not replicate ( Pashler & Wagenmakers, 2012 ). This problem is not unique to psychological science. A landmark paper with the title ‘Why most published research findings are false’ ( Ioannidis, 2005b ) was relevant to all scientific research. It was accompanied by a paper that focused on medical research, showing that, of 49 most highly cited medical papers, only 34 had been tested for replication and, of these, 14 (41%) had been convincingly shown to be wrong; 5 of 6 studies (83%) with nonrandomized designs failed to replicate ( Ioannidis, 2005a ). Subsequent studies of attempts to replicate medical findings yielded similarly gloomy results ( Begley & Ellis, 2012 ; Prinz, Schlange, & Asadullah, 2011 ). Such research led to claims that 85% of research resources are wasted ( Macleod et al., 2014 ). In psychological science, a systematic attempt to replicate 100 studies found that only 36% yielded significant replication ( Open Science Collaboration, 2015 ). Another attempt to replicate 17 structural brain-behavior findings concluded that “we were unable to successfully replicate any” ( Boekel et al., 2015 ). Although much has been written about the diagnosis, cause and prescription for fixing these cracks in the bedrock of psychological science ( Ledgerwood, 2014a , 2014b ), there is consensus throughout science that the final arbiter is replication ( Jasny, Chin, Chong, & Vignieri, 2011 ; Schmidt, 2009 ).

In this context, the purpose of this paper is to highlight 10 findings about the genetic and environmental origins of individual differences in behavior that have consistently replicated. On the basis of our decades of experience in the field of behavioral genetics and our experience in writing the major textbook in the field ( Plomin, DeFries, Knopik, & Neiderhiser, 2013 ), we selected these 10 findings because in our opinion they are ‘big’ findings both in terms of effect size and their potential impact on psychological science. These findings are not novel precisely because we have selected results that have been repeatedly verified. For this reason, each of the findings in our top-10 list has been reviewed elsewhere and a few have been highlighted previously as ‘laws’ of behavioral genetics, as noted below. Although not all of these findings are supported by formal meta-analyses, we expect that most behavioral geneticists will agree with the 10 findings on our list, although we also suspect they would wish to add to the list. What is novel about our paper is that we bring together 10 reproducible findings from behavioral genetics and consider reasons specific to behavioral genetics that might explain why these results replicate and why others do not.

Before we turn to our list, we mention five other preliminary issues. First, we should explain our use of the more modest word finding rather than the word law , which has been used previously in the context of describing replicable results from behavioral genetics (Chabris, Lee, Cesarini, Benjamin, & Laibson, in press; Plomin & Deary, 2015 ; Turkheimer, 2000 ; Turkheimer, Pettersson, & Horn, 2014 ). One reason to use the word finding is that law -- like the law of gravity -- connotes rules responsible for invariable results, and there are exceptions to our findings. We mention these exceptions, not to make the specious suggestion that exceptions prove the rule, but to point out that these exceptions are important because they stand out from the rest of the results. Another reason for avoiding the word law is that behavioral genetic statistics such as heritability ascribe variance in traits and covariance between traits to genetic and environmental sources; its results, like other descriptive statistics such as means, variances and correlations, may be limited by the samples, measures and methods employed. In terms of samples, for example, most of this research comes from developed countries and results could differ in less developed countries. Heritability describes ‘what is’ in a population – it does not predict what could be or prescribe what should be in that population or any other. It should also be emphasized that heritability does not refer to a single individual but rather to individual differences in a particular population at a particular time with its particular mix of genetic and environmental effects. Most importantly, heritability does not imply immutability ( Plomin et al., 2013 ).

A second preliminary issue concerns background and documentation. Although we provide references that describe the methods and research that underlie these findings, we cannot include details about the methods, their limitations, or the research because it would require a book-length treatment. Indeed, most of these details can be found in our textbook from which these findings were abstracted ( Plomin et al., 2013 ).

Third, many of these findings are not limited to psychological traits. Most extend to physical, physiological and medical traits as well. However, we focus on psychological traits to avoid having the paper become even more unwieldy.

Fourth, we use a broad definition of the word replication in the sense of reproducing results. In our use of the word we include conceptual as well as direct replication ( Schmidt, 2009 ).

Fifth, our goal is to describe big behavioral genetic findings that replicate, rather than describing results that have not shown sufficient replication to be included in our list. Examples, which may become more convincing with more research, include differential heritability (attempts to show that certain personality traits are more heritable than others), sex differences in heritability, and genotype-environment interaction (attempts to show that heritability differs as a function of environment).

Finally, we note that four of the top-10 findings (2, 7, 8 and 9) are about environmental influences rather than genetic influences. By using genetically sensitive designs such as twin studies, behavioral genetics has revealed almost as much about the environment as about genetics.

1. All psychological traits show significant and substantial genetic influence

Psychological domains that have traditionally focused on individual differences are those that have been studied most using genetically sensitive designs, primarily the twin method that compares resemblance in pairs of identical and fraternal twins: cognitive abilities and disabilities, psychopathology, personality, substance use and abuse, and health psychology. Traits in these domains have consistently shown significant genetic influence in adequately powered studies ( Plomin et al., 2013 ), which has led this to be described as the first ‘law’ of behavioral genetics ( Turkheimer, 2000 ). (As discussed later, model-fitting analyses emphasize estimation of effect sizes and confidence intervals, which also provides evidence for statistical significance.) Although ubiquitous genetic influence is now widely accepted, this finding should not be taken for granted because it was a battleground in psychology even a few decades ago ( Pinker, 2002 ) and remains controversial in some areas such as education ( Check Hayden, 2013 ; Haworth & Plomin, 2011 ).

As an example, a review of the world’s literature on intelligence, which included 10,000 pairs of twins, showed that identical twins are significantly more similar than fraternal twins, with twin correlations of about 0.85 and 0.60, respectively, with corroborating results from family and adoption studies, implying significant genetic influence ( Bouchard & McGue, 1981 , as modified by Loehlin, 1989 ). Although most of this research was conducted in the United States and western European countries, significant genetic influence has been found in countries such as Russia, the former East Germany, Japan, and rural and urban India ( Plomin et al., 2013 ). Recent studies continue to report similar results, as seen for example in a report of 11,000 pairs of twins from six twin studies in four countries ( Haworth et al., 2010 ). We are not aware of a single adequately powered study reporting nonsignificant heritability.

As an example in the domain of psychopathology, a meta-analysis of 14 twin studies of schizophrenia found MZ concordances of about 50% and DZ concordances of about 15%, suggesting significant genetic influence ( Sullivan, Kendler, & Neale, 2003 ), which has been corroborated in more recent studies ( Cardno et al., 2012 ), as well as in adoption studies ( Plomin et al., 2013 ). Other cognitive and psychopathological traits have not been studied as much as general intelligence and schizophrenia, but as these other traits are investigated they too repeatedly yield significant genetic influence, such as specific cognitive abilities and other aspects of psychopathology, such as autism and hyperactivity ( Plomin et al., 2013 ). For personality, scores of twin studies have over the decades yielded evidence for significant genetic influence for dozens of traits studied using self-report questionnaires ( Turkheimer et al., 2014 ), results confirmed in meta-analyses with adoption and family data as well as twin data on 24,000 pairs of twins ( Loehlin, 1992 ). Many other traits have also have been reported to show significant genetic influence such as political beliefs, religiosity, altruism and food preferences ( Plomin et al., 2013 ). A recent meta-analysis of nearly 18,000 traits from 3000 publications including 15 million twin pairs shows that this finding is not limited to psychological traits ( Polderman et al., 2015 ).

As discussed later, a strength of behavioral genetics is its focus on estimating effect size, heritability. Rather than just concluding that genetic influence is statistically significant, another consistent finding is that heritabilities are substantial, often accounting for half of the variance of psychological traits. For example, for general intelligence, heritability estimates are typically about 50% in meta-analyses of older family, twin and adoption studies ( Chipuer, Rovine, & Plomin, 1990 ; Devlin, Daniels, & Roeder, 1997 ; Loehlin, 1989 ) as well as newer twin studies ( Haworth et al., 2010 ), with 95% confidence intervals on the order of 45% - 55%. For personality, heritabilities are usually 30% -50%. For example, wellbeing is a relative newcomer in relation to genetic analyses of personality; a meta-analytic review of 10 studies based on 56,000 individuals yielded a heritability estimate of 36% (34%-38%) ( Bartels, 2015 ). It is sometimes said that the estimation of the effect size of heritability does not matter. However, surely it matters if heritabilities were just 5% rather than 50% or perhaps 95%. For example, if heritability were near 100% this implies that environmental differences that exist in the population do not have an effect on a particular phenotype assessed at a particular stage in development. However, this does not imply that new environmental factors would also have no effect.

This research has primarily relied on the twin design that compares resemblance of identical and fraternal twins and the adoption design that compares resemblance of relatives separated by adoption. Although the twin and adoption designs have separately been criticized ( Plomin et al., 2013 ), these two designs generally converge on the same conclusion, despite making very different assumptions, which adds strength to these conclusions. An exciting development is the first completely new genetic design in a century, Genome-wide Complex Trait Analysis (GCTA; Yang, Lee, Goddard, & Visscher, 2011 ). GCTA uses hundreds of thousands of DNA differences (single-nucleotide polymorphisms, SNPs, which involve a difference in a single nucleotide) across the genome to estimate chance genetic similarity for each pair of individuals in a large sample of conventionally unrelated individuals and to relate this chance genetic similarity to phenotypic similarity. GCTA underestimates genetic influence for several reasons and requires samples of several thousand individuals in order to pick up the tiny signal of chance genetic similarity from the noise of DNA differences across the genome ( Vinkhuyzen, Wray, Yang, Goddard, & Visscher, 2013 ). Nonetheless, GCTA has consistently yielded evidence for significant genetic influence for cognitive abilities ( Benyamin et al., 2014 ; Davies et al., 2015 ; St Pourcain et al., 2014 ), psychopathology ( Davis et al., 2013 ; Gaugler et al., 2014 ; Klei et al., 2012 ; Lubke et al., 2012 ; Lubke et al., 2014 ; McGue et al., 2013 ; Ripke et al., 2013 ; Wray et al., 2014 ), personality ( Rietveld, Cesarini, et al., 2013 ; Verweij et al., 2012 ; Vinkhuyzen et al., 2012 ), and substance use/drug dependence ( Palmer et al., 2015 ; Vrieze, McGue, Miller, Hicks, & Iacono, 2013 ), thus supporting the results of twin and adoption studies.

Significant and substantial genetic influence on individual differences in psychological traits is so widespread that we are unable to name an exception. The challenge now is to find any reliably measured behavioral trait for which genetic influence is not significantly different from zero in more than one adequately powered study.

2. No traits are 100% heritable

Although heritability estimates are significantly greater than 0%, they are also significantly less than 100%. As noted above, heritabilities are substantial, typically 30% - 50%, but this is a long way from 100%. Again, we are unable to find any exception in which the heritability of a behavioral trait is near 100%. This is not a limitation of the methods because some traits, such as individual differences in height, yield heritabilities as high as 90%. However, it should be noted that behavioral traits are less reliably measured than physical traits such as height and error of measurement contributes to nonheritable variance. Many others have noted that no traits are 100% heritable (e.g., Plomin, 1989 ; Turkheimer, 2000 ).

Although this finding might seem obvious and unsurprising, it is crucial because it provides the strongest available evidence for the importance of environmental influence after controlling for genetic influence. Because genetic influence is significant and substantial, it is necessary to control for genetic influence when investigating environmental influence. Environmental research using genetically sensitive designs has led to three of the most important discoveries about the way the environment affects behavioral development, presented as findings 7, 8 and 9.

3. Heritability is caused by many genes of small effect

The two previous findings come from family-based genetic designs, primarily twin and adoption studies. Although the quantitative genetic model underlying these methods ( Fisher, 1918 ) assumes that many genes affect complex traits and common disorders, these methods cannot estimate how many genes are involved in heritability or the distribution of their effect sizes.

Powerful but overlooked evidence that many genes affect complex traits including behavior comes from selection studies in nonhuman animal research. If only a few genes were responsible for the heritability of a trait, selected lines would separate after a few generations and would not diverge any further in later generations. In contrast, selection studies of complex traits show a linear response to selection even after dozens of generations of selection, as seen for example ( Figure 1 ) in one of the largest and longest selection studies of behavior that included replicate selected and control lines ( DeFries, Gervais, & Thomas, 1978 ). Another overlooked point from selection studies is that genetic effects transmitted from parents to offspring can only be due to additive genetic effects (the independent effects of alleles and loci that ‘add up’), in contrast to nonadditive genetic effects in which the effects of alleles and loci that interact. This is important information because it would be difficult to identify specific DNA differences responsible for heritability if genetic effects on behaviour were caused by interactions between many loci (epistasis).

An external file that holds a picture, illustration, etc.
Object name is emss-66004-f0001.jpg

Results of a selection study of open-field activity in mice. Replication was built into the design: two lines were selected for high open-field activity (H 1 and H 2 ), two lines were selected for low open-field activity (L 1 and L 2 ), and two lines were randomly mated within each line to serve as controls (C 1 and C 2 ). After 30 generations of such selective breeding, a 30-fold average difference in activity had been achieved, with no overlap between the activity of the low and high lines. (From DeFries, Gervais & Thomas, 1978 .)

GCTA also provides evidence for the highly polygenic nature of quantitative traits and qualitative disorders because it shows that SNPs on each chromosome contribute cumulatively to the heritability estimated by GCTA ( Yang et al., 2013 ). The strongest evidence comes from a method called genome-wide association (GWA), which has been widely used in attempts to identify specific DNA associations with quantitative traits and qualitative disorders ( Manolio et al., 2009 ). An association is a correlation between a trait or disorder and the frequency of one of the two alleles (forms) of a SNP; for example, the frequency of a particular allele of the gene that encodes apolipoprotein E is about 40 percent for individuals with Alzheimer disease and 15 percent for control individuals who do not have the disorder.

Earlier attempts to identify gene associations with behavior investigated a few genes thought to be ‘candidates’ on the basis of their function; however, such candidate gene studies have not generally replicated, for example, for schizophrenia ( Farrell et al., 2015 ) or intelligence ( Chabris et al., 2012 ). GWA is an atheoretical approach that uses hundreds of thousands or millions of SNPs covering most of the genome to detect population associations between a SNP and a trait.

GWA has been successful in detecting SNP associations for many traits and disorders ( Visscher, Brown, McCarthy, & Yang, 2012 ), but it was a shock to discover that the largest effect sizes are extremely small ( Gratten, Wray, Keller, & Visscher, 2014 ). For example, the largest associations in a GWA meta-analysis of over 36,000 diagnosed schizophrenic cases and 113,000 controls accounted for less than 1.1-fold increase in the odds of a schizophrenia diagnosis ( Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014 ). In a GWA study of years of schooling, the three largest replicated SNP associations in a sample of 120,000 individuals each accounted for only 0.0002 of the variance of years of schooling in independent samples ( Rietveld et al., 2014 ; Rietveld, Medland, et al., 2013 ). In other words, the largest effect sizes detected by GWA are extremely small for both disorders and traits. This finding has been noted by many others, and specifically in relation to psychological traits (e.g., Chabris et al., 2015 ; Plomin & Deary, 2015 ). These results are based on common SNPs, which have been used in GWA studies. Exciting results are emerging from other types of DNA variants, such as rare duplications and deletions of long stretches of DNA, called copy number variants ( Farrell et al., 2015 ).

Our purpose here is not to discuss issues involved in using GWA to detect and replicate such small effects but rather to turn the results of GWA studies around. Although the power of GWA is limited to detect such minuscule effects even with samples in the tens or hundreds of thousands, these studies have tremendous power to detect larger effects ( Robinson, Wray, & Visscher, 2014 ). For example, a GWA study of 20,000 individuals has 99.9% power to detect an association with an effect size that accounts for 1% of the variance (i.e., a correlation of 0.10). This suggests that no such associations exist with effect sizes larger than 1% in the population. Some extremely rare mutations have large effects on individuals, but because they are rare their effect on the population is small. If the largest effects are so small, the smallest effects are likely to be infinitesimal, which implies that heritability is caused by many genes of small effect (Chabris et al., in press; Plomin & Simpson, 2013 ).

4. Phenotypic correlations between psychological traits show significant and substantial genetic mediation

Much psychological research is about the relationship between traits. For example, a recent issue in this journal included reports on associations between creativity and mental health, stress reactivity and neuroticism, empathy and moral behavior, and personality and job performance. Few of the thousands of reported correlations between traits such as these have been studied using genetically sensitive designs. However, when genetically informed designs are used, research consistently points to a finding with far-reaching implications: Phenotypic covariance between traits is significantly and substantially caused by genetic covariance, not just environmentally driven covariance.

Multivariate genetic analysis estimates the extent to which genetic and environmental influences contribute to the phenotypic covariance between traits by comparing for example the cross-trait cross-twin correlations for MZ and same-sex DZ twins (i.e., correlating one twin’s X with the co-twin’s Y) ( Plomin et al., 2013 ). If the MZ cross-correlation is greater than the DZ cross-correlation, it suggests that genetic factors contribute to the phenotypic correlation between the traits, which is what we mean by the phrase genetic mediation .

Cognitive abilities have been studied most systematically from a multivariate genetic perspective. This research consistently shows that the phenotypic correlations among cognitive abilities are mediated significantly and substantially by genetic factors, called generalist genes ( Plomin & Kovas, 2005 ). For example, as shown in Figure 2 , a multivariate genetic analysis of intelligence, reading, mathematics and language in nearly 5000 12-year-old twins found that genetic factors consistently accounted for over half of the phenotypic correlations, ranging from 53% to 65%, with a mean of 61% and a mean 95% confidence interval of 53% - 67% ( Davis, Haworth, & Plomin, 2009 ). These findings have received support from multivariate GCTA ( Trzaskowski et al., 2013 ). One implication of this finding is that the phenotypic structure of domains is similar to their genetic structure, as has been shown, for example, for cognitive abilities ( Petrill, 1997 ) and personality ( Turkheimer et al., 2014 ).

An external file that holds a picture, illustration, etc.
Object name is emss-66004-f0002.jpg

Results of multivariate genetic latent variable analysis of general cognitive ability ( g ), reading, mathematics, and language of more than 5000 pairs of 12-year-old twins assessed on a web-based battery of measures. A = additive genetic effects; C = shared (common) environmental effects; E = nonshared environmental effects. Squares represent measured traits; circles represent latent factors. Multiple tests are used to index latent factors of g , reading, mathematics, and language. The lower tier of path coefficients represents factor loadings of the tests on the latent factor. The second tier of coefficients represents the genetic and environmental components of the variance of the latent variables – the path coefficients in this path diagram are the square roots of these coefficients. The curved arrows at the top represent genetic correlations, the extent to which genetic effects on one trait are correlated with genetic effects on another. The genetic contribution to the phenotypic correlation between two traits can be calculated as the product of the paths that connect them. For example, the genetic contribution to the phenotypic correlation between reading and math is √ .70 × .75 × √ .61 = 0.49. The phenotypic correlation is 0.76, which means that genetic factors account for 64% of the phenotypic correlation (i.e., .49 / .76 = .64). (From Davis, Haworth & Plomin, 2009 .)

More than one hundred twin studies have addressed the key question of comorbidity in psychopathology and this body of research also consistently finds substantial genetic overlap between common disorders ( Cerda, Sagdeo, Johnson, & Galea, 2010 ; Kendler, Prescott, Myers, & Neale, 2003 ) in children ( Rhee, Lahey, & Waldman, 2015 ) and in adults ( Kendler et al., 2011 ). For example, a review of 23 twin studies and 12 family studies confirms that anxiety and depression are correlated entirely for genetic reasons ( Middeldorp, Cath, Van Dyck, & Boomsma, 2005 ). In other words, the same genes affect both disorders, which means that from a genetic perspective they are the same disorder. Even the comorbidity between schizophrenia and bipolar depression, the first fork in the diagnosis of psychosis, is mainly due to genetic factors ( Lichtenstein et al., 2009 ). Again, this implies that many of the same genes affect both disorders. These twin study findings of genetic overlap among disorders have received support from multivariate GCTA studies ( Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013a ) and from GWA studies (Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013c). For example, a recent review of molecular genetic studies of schizophrenia concluded: “There is evidence for shared genetic risk between schizophrenia, bipolar disorder, autism spectrum disorders, intellectual disability and attention-deficit and hyperactivity disorder” ( Kavanagh, Tansey, O'Donovan, & Owen, 2015 , p. 76). These results convey an important implication: the genetic structure of psychopathology does not map neatly on current diagnostic classifications ( Doherty & Owen, 2014 ). Moreover, correlations between personality dimensions and psychopathological diagnoses are also mediated genetically, most notably between neuroticism and depression ( Kendler, Gatz, Gardner, & Pedersen, 2006 ).

This finding goes far beyond these well-known examples of genetic contributions to correlations in the domains of cognitive abilities and psychopathology. Whenever a phenotypic correlation is found between two behavioral traits, the genetic contribution to the phenotypic correlation is significant and substantial, with the usual caveat of adequate power, which is especially severe for low phenotypic correlations. As one of many such examples of new but as yet unreplicated findings of this type, genes accounted for more than 70% of the phenotypic correlations of about 0.30 between attitudes toward exercise and exercise behavior, meaning that many of the same genes affect the two traits ( Huppertz et al., 2014 ).

This finding extends even further, to the phenotypic correlations between behavior and other variables that are not ostensibly measures of behavior. One of our other findings is of this type: phenotypic correlations between behavioral measures and environmental measures ( Finding 8 ).

5. The heritability of intelligence increases throughout development

Unlike the other findings, this one is limited to a specific domain, general cognitive ability (intelligence), but it is one of the most surprising and counterintuitive findings from behavioral genetics. Although it would be reasonable to expect that experiences accumulate in their effect as time goes by, which some developmental theories propose (e.g., Baltes, Reese, & Lipsitt, 1980 ), the heritability of intelligence has consistently over three decades’ research been found to increase linearly throughout the life course in longitudinal as well as cross-sectional analyses and in adoption as well as twin studies ( McGue, Bouchard, Iacono, & Lykken, 1993 ; Plomin, 1986 ; Plomin & Deary, 2015 ). For example, as summarized in Figure 3 , an analysis of cross-sectional data for 11,000 pairs of twins – larger than all previous twin studies combined – showed that the heritability of intelligence increases significantly from 41% in childhood (age 9) to 55% in adolescence (age 12) and to 66% in young adulthood (age 17) ( Haworth et al., 2010 ). The non-overlapping standard errors in Figure 3 suggest that the increases in heritability across the three ages are significant and model-fitting confirmed that the increases are significant. A meta-analysis of results from longitudinal twin and adoption studies also found increases in heritability from infancy through adolescence ( Briley & Tucker-Drob, 2013 ). Some evidence suggests that heritability might increase to as much as 80% in later adulthood independent of dementia ( Panizzon et al., 2014 ); other results suggest a decline to about 60% after age 80 ( Lee, Henry, Trollor, & Sachdev, 2010 ) but another study suggests no change in later life ( McGue & Christensen, 2013 ).

An external file that holds a picture, illustration, etc.
Object name is emss-66004-f0003.jpg

A meta-analysis of 11,000 pairs of twins showed that heritability (A) of intelligence increases significantly from childhood (age 9) to adolescence (age 12) and to young adulthood (age 17). Estimates of shared environmental influence (C) decreased significantly from childhood to adolescence. Nonshared environment (E) showed no change. (From Haworth et al., 2010 .)

Increasing heritability for intelligence is interesting because other domains such as personality do not show systematic changes in heritability during development ( Turkheimer et al., 2014 ); reasons for this difference in results are not known. However, a meta-analysis of seven behavioral domains other than intelligence found significant increases in heritability for externalizing and internalizing behavior problems and social attitudes during adolescence and young adulthood ( Bergen, Gardner, & Kendler, 2007 ). There was no evidence for significant decreases in heritability, suggesting that when heritability changes in development, it increases, although the evidence is not as compelling as it is for intelligence.

Why does heritability of intelligence increase throughout development? Increasing heritability could be due to new genetic influences coming on line, a process called innovation , which would seem reasonable given the changes in brain structure and function that occur during development. However, the next finding, about age-to-age genetic stability, suggests a less obvious reason for the developmental increase in heritability.

6. Age-to-age stability is mainly due to genetics

Longitudinal genetic studies consistently show that phenotypic correlations from age to age are largely due to genetic stability. In other words, genetic effects contribute to continuity (the same genes affect the trait across age), whereas age-to-age change is primarily the provenance of environmental factors ( Plomin, 1986 ). Longitudinal genetic analysis is a variant on multivariate genetic analysis (see Finding 4 ) of the phenotypic covariance across time for the ‘same’ trait. Such research has shown that phenotypic stability from age to age is mainly due to genetics for personality, psychopathology and intelligence, domains for which the most longitudinal genetic data are available.

For personality, the first report of a longitudinal genetic analysis over an age span of a decade concluded that 80% of the phenotypic stability was mediated genetically ( McGue, Bacon, & Lykken, 1993 ), which has been confirmed in recent meta-analyses ( Briley & Tucker-Drob, 2014 ; Turkheimer et al., 2014 ). For psychopathology, fewer longitudinal genetic studies are available but results are similar for diverse traits related to psychopathology such as borderline personality disorder ( Bornovalova, Hicks, Iacono, & McGue, 2009 ); antisocial personality disorder ( Burt, McGue, Carter, & Iacono, 2007 ); aggression (van Beijsterveldt, Bartels, Hudziak, & Boomsma, 2003); attention problems ( Rietveld, Hudziak, Bartels, Van Beijsterveldt, & Boomsma, 2004 ); withdrawn behavior ( Hoekstra, Bartels, Hudziak, Van Beijsterveldt, & Boomsma, 2008 ); anxiety and depression after childhood ( Kendler, Gardner, & Lichtenstein, 2008 ); and general internalizing and externalizing problems ( Bartels et al., 2004 ).

For intelligence, similar results have been reported, for example, in a meta-analysis of 15 longitudinal studies ( Tucker-Drob & Briley, 2014 ). This finding creates an apparent paradox: How can the heritability of intelligence increase so substantially throughout development if genetic effects are stable? That is, how can the same genes largely affect intelligence across the life course and yet genes account for more variance as time goes by? Increasing heritability despite genetic stability implies some contribution from what has been called genetic amplification ( Plomin & DeFries, 1985 ). In other words, genetic nudges early in development are magnified as time goes by, increasing heritability, but the same genetic propensities continue to affect behavior throughout the life course. This amplification model has recently been supported in a meta-analysis of 11,500 twin and sibling pairs with longitudinal data on intelligence, which found that a genetic amplification model fit the data better than a model in which new genetic influences arise across time ( Briley & Tucker-Drob, 2013 ). Genotype-environment correlation seems the most likely explanation in which small genetic differences are amplified as children select, modify and create environments correlated with their genetic propensities ( Scarr & McCartney, 1983 ). As mentioned earlier, all behavioral genetic results are limited by the samples, measures and methods employed, which means that such results could differ for example in different cultures.

This active model of selected environments—in contrast to the traditional model of imposed environments—offers a general paradigm for thinking about how genotypes become phenotypes ( Plomin, 1994 ). Genotype-environment correlation also predicts the next finding about genetic influence on ostensible measures of the environment.

7. Most measures of the ‘environment’ show significant genetic influence

Although it might seem a peculiar thing to do, measures of the environment widely used in psychological science – such as parenting, social support, and life events – can be treated as dependent measures in genetic analyses. If they are truly measures of the environment they should not show genetic influence. To the contrary, in 1991 a review of the first 18 studies using environmental measures as dependent measures in genetically sensitive designs showed evidence for genetic influence for these measures of the environment ( Plomin & Bergeman, 1991 ). Significant genetic influence was found for objective measures such as videotape observations of parenting as well as self-report measures of parenting, social support, and life events. How can measures of the environment show genetic influence? The reason is that such measures do not assess the environment ‘out there’ independent of the person. As noted above, we select, modify and create environments correlated with our genetic behavioral propensities such as personality and psychopathology ( McAdams, Gregory, & Eley, 2013 ). For example, in studies in which children are twins, parenting can reflect genetic differences in children’s characteristics such as their personality and psychopathology ( Avinun & Knafo, 2014 ; Klahr & Burt, 2014 ; Plomin, 1994 ).

In the 25 years since 1991, more than 150 papers using environmental measures in genetically sensitive designs have been published, consistently showing significant genetic influence on environmental measures, extending the findings from family environments to neighborhood, school, and work environments. A review of 55 independent genetic studies found an average heritability of 0.27 across 35 diverse environmental measures ( Kendler & Baker, 2007 ; confidence intervals not available). Meta-analyses of parenting, the most frequently studied domain, show genetic influence that is driven by child characteristics ( Avinun & Knafo, 2014 ) as well as by parent characteristics ( Klahr & Burt, 2014 ). Some exceptions have emerged. Not surprisingly, when life events are separated into uncontrollable events (e.g., death of a spouse) and controllable life events (e.g., financial problems), the former show nonsignificant genetic influence. As a reminder that all behavioral genetic results can differ in different cultures, a comparison of parenting in Japan and Sweden found that parenting in Japan showed more genetic influence than in Sweden, which is consistent with the view that parenting is more child centered in Japan than in the West ( Shikishima, Hiraishi, Yamagata, Neiderhiser, & Ando, 2012 ).

GCTA has begun to replicate these findings from twin studies. For example, GCTA has shown significant genetic influence on stressful life events ( Power et al., 2013 ) and on variables often used as environmental measures in epdemiological studies such as years of schooling ( Rietveld, Medland, et al., 2013 ). GCTA can also circumvent a limitation of twin studies when the twins are children. Such twin studies are limited to investigating within-family (twin-specific) experiences, whereas many important environmental factors such as SES are the same for two children in a family. However, GCTA can assess genetic influence on family environments such as SES that differ between families not within families. GCTA has shown genetic influence on family SES ( Trzaskowski et al., 2014 ) and an index of social deprivation ( Marioni et al., 2014 ).

8. Most associations between environmental measures and psychological traits are significantly mediated genetically

If genetic factors affect environmental measures as well as behavioral measures, it is reasonable to ask the extent to which associations between environmental measures and behavioral measures are mediated genetically. For example, rather than assuming that correlations between parenting and children’s behavior are caused by the environmental effect of parenting on children’s behavior, it is important to consider the possibility that the correlation is in part due to genetic factors that influence both parenting and children’s behavior. Individual differences in parenting might reflect genetically driven differences in children’s behavior or differences in parenting might be due to genetically driven propensities of parents that are inherited directly by their children.

In 1985, using a parent-offspring adoption design, evidence emerged for genetic mediation that accounted on average for about half of the correlations between measures of home environment and infants’ development ( Plomin, Loehlin, & DeFries, 1985 ). For example, at age 2, the correlation between the Home Observation for Measurement of the Environment (HOME) and Bayley Mental Development Index was 0.44 in nonadoptive families, in which parents share nature as well as nurture with their offspring, as compared to 0.29 in adoptive families in which parents and offspring are genetically unrelated ( Plomin & DeFries, 1985 ). Similar results were available but not noticed in earlier adoption studies ( Burks, 1928 ; Leahy, 1935 ).

In twin studies, multivariate genetic analysis (see Finding 4 ) can be used to disentangle genetic and environmental effects from correlations between environmental measures and behavioral measures. As shown in Figure 4 , the first study of this type found that two-thirds of the correlation between maternal negativity and adolescent children’s antisocial behavior could be attributed to genetic factors ( Pike, McGuire, Hetherington, Reiss, & Plomin, 1996 ). More than a hundred studies have reported similar results, extending the findings to cross-lagged longitudinal analyses ( Burt, McGue, Krueger, & Iacono, 2005 ; Neiderhiser, Reiss, Hetherington, & Plomin, 1999 ) and to new designs such as the children-of-twins design ( Knopik et al., 2006 ; McAdams et al., 2014 ) and the combined parents-of-twins and extended children-of-twins design ( Narusyte et al., 2008 ).

An external file that holds a picture, illustration, etc.
Object name is emss-66004-f0004.jpg

Results of bivariate model-fitting analysis between mothers’ negativity and adolescents’ antisocial behavior. The paths are standardized partial regressions (all significant at p < .05) from the latent variables representing genetic (A) and shared (C) and nonshared (E) environmental effects on the measured variables. The genetic contribution to the phenotypic correlation is the product of the standardized paths 0.77 × 0.52 = 0.40. Calculated in the same way, the environmental contributions to the phenotypic correlation are 0.16 for C and 0.05 for E. The phenotypic correlation, 0.61, is the sum of these three contributions. The sample consisted of 719 families with same-sex adolescent sibling pairs including twins, full siblings, half siblings and unrelated siblings. (Adapted from Pike, McGuire, Hetherington, Reiss & Plomin, 1996 .)

GCTA is beginning to provide additional support for this finding. For example, bivariate GCTA has shown significant genetic mediation between family SES and children’s intelligence ( Trzaskowski et al., 2014 ) and educational performance ( Krapohl & Plomin, 2015 ). Showing genetic influence on family SES and its association with children’s intelligence and educational performance is less surprising than it might at first seem because family SES indexes parental education which also correlates substantially with parental intelligence.

It is important to disentangle genetic and environmental influences on correlations between environmental and behavioral measures for three reasons. First, if these correlations are mediated genetically, interpretations that assume environmental causation are wrong, which has important implications for intervention. Second, genetically sensitive designs can identify causal effects of the environment free of genetic confound ( Marceau et al., 2015 ). Third, genetic mediation of the association between environmental measures and behavioral traits is not just a nuisance that needs to be controlled. It suggests a general way of thinking about how genotypes develop into phenotypes, from a passive model of imposed environments to an active model of shaped experiences in which we select, modify and create experiences in part based on our genetic propensities.

9. Most environmental effects are not shared by children growing up in the same family

It is reasonable to think that growing up in the same family makes brothers and sisters similar psychologically, which is what developmental theorists from Freud onwards assumed. However, for most behavioral dimensions and disorders, it is genetics that accounts for similarity among siblings. Although environmental effects have a major impact (see Finding 2 ), the salient environmental influences do not make siblings growing up in the same family similar. The message is not that family experiences are unimportant but rather that the relevant experiences are specific to each child in the family. This finding was ignored when it was first noted ( Loehlin & Nichols, 1976 ) and controversial when it was first highlighted ( Plomin & Daniels, 1987a , 1987b ), but it is now widely accepted because it has consistently replicated ( Plomin, 2011 ; Turkheimer, 2000 ). The acceptance is so complete that the focus now is on finding any shared environmental influence ( Buchanan, McGue, Keyes, & Iacono, 2009 ), for example, for personality (e.g., Matteson, McGue, & Iacono, 2013 ) and some aspects of childhood psychopathology ( Burt, 2009 , 2014 ). For instance, for antisocial behavior in adolescence, shared environment accounts for about 15% of the total phenotypic variance; however, even here nonshared environment accounts for more of the variance, about 40% in meta-analyses, although these estimates include variance due to error of measurement ( Rhee & Waldman, 2002 ). Academic achievement consistently shows some shared environmental influence, presumably due to the effect of schools, although the effect is surprisingly modest in its magnitude (about 15% for English and 10% for Mathematics) given that this result is based on siblings growing up in the same family and being taught in the same school ( Kovas, Haworth, Dale, & Plomin, 2007 ). An interesting developmental exception is that shared environmental influence is found for intelligence up until adolescence and then diminishes as adolescents begin to make their own way in the world, as shown in meta-analyses ( Briley & Tucker-Drob, 2013 ; Haworth et al., 2010 ).

Progress in identifying specific sources of nonshared environmental effects has been slow ( Turkheimer & Waldron, 2000 ), although the MZ differences design is proving useful in detecting some nonshared effects controlling for genetic confounding ( Plomin, 2011 ). It seems likely that nonshared environmental effects are due to many experiences of small effect, analogous to Finding 3 (‘heritability is caused by many genes of small effect’). That is, rather than asking whether a monolithic factor like parental control is primarily responsible for nonshared effects, it might be necessary to consider many seemingly inconsequential experiences that are tipping points in children’s lives. The ‘gloomy prospect’ is that these could be idiosyncratic stochastic experiences – chance ( Plomin & Daniels, 1987a ). However, the basic finding that most environmental effects are not shared by children growing up in the same family remains one of the most far-reaching findings from behavioral genetics. It is important to reiterate that the message is not that family experiences are unimportant but rather that the salient experiences that affect children’s development are specific to each child in the family, not general to all children in the family.

10. Abnormal is normal

A fundamental question about common psychological disorders is the extent to which genetic and environmental effects on disorders are merely the quantitative extremes of the same genetic and environmental factors that affect the rest of the distribution. Or are common disorders qualitatively different from the normal range of behavior? There are thousands of rare single-gene disorders such as phenylketonuria (PKU), which causes intellectual disability and has a frequency of about 1 in 10,000. This is the way we often think about disorders – as qualitatively different from the normal range of behavior. However, disorders studied by psychologists are much more common, including learning disabilities and psychopathology such as schizophrenia, autism, and hyperactivity.

Quantitative genetic methods suggest that common disorders are the extremes of the same genetic factors responsible for heritability throughout the distribution, although the evidence is indirect and the methods are somewhat abstruse. After describing two quantitative genetic methods (DeFries-Fulker extremes analysis and liability-threshold model-fitting) that provide support for this conclusion, we consider DNA research that addresses this issue directly. The first quantitative genetic method is DeFries-Fulker (DF) extremes analysis, which assesses genetic links between the extremes and the normal range of variation by bringing together disorders and dimensions ( DeFries & Fulker, 1985 , 1988 ). Rather than assessing the genetic etiology of a disorder dichotomously using identical and fraternal twin concordance rates, DF extremes analysis assesses the extent to which the quantitative scores of identical and fraternal twin partners (cotwins) of selected index cases (probands) regress differentially to the population mean. In other words, to the extent that genetic influences are responsible for the difference between the probands and the rest of the population, cotwins should be more similar to the probands for identical twins than for fraternal twins. This comparison of identical and fraternal cotwin means yields an estimate of group heritability, an index of the extent to which the extreme scores of probands is due to genetic influences, and thereby provides a test of the hypothesis that the etiology of extreme scores differs from that of variation within the normal range. Consequently, finding significant group heritability implies that there are genetic links between the disorder, however assessed, and the quantitative trait. That is, if the measure of extremes (or a diagnosis) were not linked genetically to the quantitative trait, group heritability would be zero.

DF extremes analysis was developed to assess reading disability in the context of reading ability ( DeFries, Fulker, & LaBuda, 1987 ). Research using the method has consistently shown that group heritabilities are substantial for cognitive disability such as language, mathematical and general learning disability, as well as reading disability ( Plomin & Kovas, 2005 ). An interesting exception involves severe intellectual disability (IQ < 70), which DF extremes analysis suggests is etiologically distinct from the normal distribution of intelligence ( Reichenberg et al., in press ).

Another quantitative genetic technique, called liability-threshold model-fitting, relies on dichotomous data such as diagnoses. It assumes that liability is distributed normally but that the disorder occurs only when a certain threshold of liability is exceeded. Liability-threshold model-fitting estimates heritability of liability but this is not the heritability of the disorder as assessed quantitatively – it is the heritability of a hypothetical construct of continuous liability derived from dichotomous data. Nonetheless, if all the assumptions of the liability-threshold model are correct for a particular disorder, it will yield results similar to the DF extremes analysis to the extent that the quantitative dimension assessed underlies the qualitative disorder. For cognitive disabilities and abilities, liability-threshold analyses yield estimates of heritability similar to DF extremes analysis ( Plomin & Kovas, 2005 ). Similar results from DF extremes analysis and liability-threshold model-fitting have been found for psychopathology ( Robinson, Neale, & Daly, 2015 ; for recent examples, see Zavos et al., 2014 ). In this way, these two quantitative genetic methods – DeFries-Fulker extremes analysis and liability-threshold model-fitting – lead to the conclusion that common disorders represent the extremes of the same genetic influences responsible for heritability throughout the distribution.

DNA research can address this issue directly: Genes associated with disorders are expected to be associated with dimensions and vice versa. Although evidence for replicable genetic associations is just emerging for complex traits, the data are consistent with this prediction ( Plomin, Haworth, & Davis, 2009 ). For example, a polygenic score derived from a GWA of ADHD cases and control significantly predicted an ADHD trait measure in the general population ( Groen-Blokhuis et al., 2014 ; Martin, Hamshere, Stergiakouli, O’Donovan, & Thapar, 2014 ) and vice versa ( Stergiakouli et al., 2015 ).

As mentioned earlier, most DNA research to date relies on common SNPs, which yield small effects, but it is possible that other types of DNA variants yield larger effects. Nonetheless, based on what we know now relying on common SNPs, it seems safe to hypothesize that most common disorders are at the genetic extreme of the spectrum of normal trait variation. This seems a safe hypothesis because heritability of complex traits and common disorders is caused by many genes of small effect ( Finding 3 ), which implies that together these genetic effects will contribute to a quantitative distribution, as Fisher (1918) assumed, even though each gene is inherited in the discrete manner hypothesized by Mendel (1866) . Empirical support for Fisher’s prediction is emerging from genome-wide association studies that detect many associations of small effect (see Finding 3 ). Although the individual effects of these associations are tiny, their effects can be aggregated in ‘polygenic’ scores, like summing items on a test ( Wray et al., 2014 ). These polygenic scores are distributed normally, as Fisher anticipated ( Plomin et al., 2009 ). The normal distribution of polygenic scores suggests that what we call disorders are the quantitative extreme of the same genetic factors that affect the rest of the distribution. Stated more provocatively, there are no common disorders, just quantitative traits – the abnormal is normal. This finding supports the recently adopted NIMH Research Domain Criteria strategy that focuses on dimensional models of psychopathology rather than diagnostic categories ( Insel et al., 2010 ).

There is also a less obvious implication. Polygenic scores are typically referred to as polygenic risk scores because their constituent associations were derived from case-control studies comparing a group of individuals diagnosed with a disorder and controls. However, this ‘risk’ label misses the point that because these polygenic scores are distributed normally their distribution has a positive end as well as a negative end. This opens up opportunities for considering positive genetics -- how children flourish rather than flounder and about resilience rather than vulnerability ( Plomin et al., 2009 ).

Why do behavioral genetic results replicate?

Dozens of papers have considered general reasons for false positive publications in science ( Ioannidis, 2005b , 2014 ) including psychological science ( Pashler & Wagenmakers, 2012 ). Much of the discussion has concentrated on problems related to null-hypothesis significance testing and ‘chasing p values’ ( Cumming, 2014 ). The prescribed remedy is the ‘new statistics’: estimating effect sizes, power, and meta-analysis. These new statistics are relevant to replication in behavioral genetics, as mentioned below.

Other issues proposed as risk factors for false positive findings include questionable research practices such as flexibility in analytic procedures ( Simmons, Nelson, & Simonsohn, 2011 ), academic issues such as a hypercompetitive culture for publishing, and publication issues such as bias towards novel and positive results ( Ioannidis, Munafo, Fusar-Poli, Nosek, & David, 2014 ). Although it is possible that behavioral genetic studies are less subject to publication bias because finding low heritability is as interesting as finding high heritability, these risk factors also affect studies in behavioral genetics. Thus, they cannot explain why the top-10 behavioral genetic findings replicate.

Here we go beyond these general risk factors, which have been widely discussed, to suggest five reasons for replication that appear to be specific to behavioral genetics. Because these reasons are specific to behavioral genetics, they are not a panacea for replicating results in other fields, although we suggest ways in which these issues might be relevant.

Controversy

The modern origins of genetic research in psychology began 150 years ago with the work of Francis Galton, who coined the phrase ‘nature and nurture’ ( Galton, 1869 ), which launched psychology’s major conflict of the twentieth century ( Pinker, 2002 ). We suggest that the controversy and conflict surrounding behavioral genetics had the positive effect of motivating bigger and better studies that met the high standard of evidence needed to convince sceptical psychological scientists of the importance of genetics in the development of individual differences in behavior. A single study was not enough –- it was the convergence of evidence across studies using different methods that tipped the balance of opinion.

The relevance for other embattled fields is the comfort of knowing that the extra effort required to address scepticism and criticism can pay off in building a stronger foundation for a field.

The new statistics are not new to behavioral genetics

Most of the concern about failures to replicate relates to experiments that test for significant mean differences between experimental and control groups. Null-hypothesis significance testing (NHST) and p values have been central to the experimental approach rather than estimation of effect sizes, confidence intervals, and power ( Cumming, 2014 ). As a result, experimental research has often relied on sample sizes that are underpowered to detect reasonable effect sizes, and thus published results are at increased risk of being false positives ( Marszalek, Barber, Kohlhart, & Holmes, 2011 ), especially in cognitive science ( Ioannidis, 2014 ) and neuroscience ( Button et al., 2013 ), where most research is experimental and sample sizes tend to be small.

In contrast, human behavioral genetic research does not experimentally manipulate genes or environments or randomly assign participants to groups (although nonhuman animal research can do both). Its purview is naturally occurring differences between individuals, a perspective shared with research on psychopathology, personality and cognitive abilities and disabilities. What is unique in behavioral genetics is its attempt to estimate the extent to which observed variance can be attributed to genetic and environmental components of variance.

Focusing on naturally occurring variability does not insure replicable results; indeed, the demands for power are far greater for individual differences research than for detecting mean differences between groups. However, the essential statistics of individual differences, variance and covariance, are effect size indicators and forced behavioral geneticists to face issues about estimating effect size, confidence intervals, and power.

Other fields are likely to profit from considering individual differences as well as group differences.

Focusing on the net effect of genetic and environmental influences

Another important factor that contributes to the replicability of behavioral genetic results is that it partitions total phenotypic variance into genetic and environmental components of variance rather than identifying specific genes or specific environmental factors. That is, heritability indexes the net effect of all inherited DNA differences on phenotypic variance, regardless of the number or effect size of individual DNA variants or the complexity of mechanisms by which they affect the trait. Point estimates of heritability vary but reliability is found within the confidence intervals of these estimates.

In contrast, attempts to identify specific genes associated with complex traits have been much more difficult to replicate because the number of genes responsible for heritability is so large and their individual effects are so small ( Chabris et al., 2012 ; Chabris et al., in press). Rather than trying to identify individual SNPs of small effect size that need to reach statistical significance in the face of massive multiple testing of millions of SNPs (typically p < 5*10 −8 ), greater success is beginning to be achieved with polygenic scores that aggregate small effects across the genome for thousands of SNPs even though the individual SNPs are not significantly associated with the trait ( Plomin & Simpson, 2013 ). For example, Figure 5 shows the association between polygenic scores for educational attainment (college yes or no) and scores on a test of mathematics achievement in 16-year-olds ( Krapohl et al., 2015 ). Polygenic scores were calculated for 3000 16-year-olds based on results from a GWAS of educational attainment with 120,000 adults ( Rietveld et al., 2013 ). Although the polygenic scores accounted for only 2% of the variance in math scores ( r = 0.15, s.e., 0.02), the top and bottom septiles differed by half a standard deviation. This suggests that, even with modest effect sizes, polygenic scores can be used to select low and high genotypic extremes for intensive and expensive research, such as clinical or neuroscience research.

An external file that holds a picture, illustration, etc.
Object name is emss-66004-f0005.jpg

A polygenic score based on a GWAS of educational attainment in adults correlates 0.15 with mathematics scores at age 16. (Adapted from Krapohl et al., 2015 .)

Similarly, it has been difficult to pin down specific environmental factors responsible for the large nonshared environmental component of variance for behavioral traits ( Plomin, 2011 ; Turkheimer et al., 2014 ). However, in the case of nonshared environment there is nothing analogous to polygenic scores that make it possible to aggregate small effects.

Other fields might also profit from considering approaches analogous to components of variance or polygenic scores.

Incentives for replication and meta-analysis

Replication is key to a progressive science that produces a steady accumulation of knowledge. Lack of incentives for publishing replication studies is a major culprit in the perseverance of false positive findings ( Bakker, van Dijk, & Wicherts, 2012 ). Behavioral genetic research has been conducive to replication, not so much in pursuit of lofty ideals of a progressive science as for more mundane reasons. One reason is that behavioral genetic research often involves large representative samples of difficult-to-obtain individuals such as twins and adoptees. This creates opportunities for replication and meta-analysis across studies and across countries because, once such expensive long-term studies are created, they often collect data on a wide range of psychological traits, which means that many studies have data on similar traits ( Hur & Craig, 2013 ).

A different reason prevails for replication in behavioral genetic studies that select individuals on the basis of a diagnosis such as schizophrenia. Here the importance of the basic question of nature and nurture drives researchers in different countries to conduct separate studies of these disorders, which also sets the stage for replication and meta-analysis.

The DNA revolution has greatly accelerated this trend toward replication and meta-analysis in behavioral genetics research. DNA analysis can be applied to unrelated individuals -- that is, it does not require special samples of twins and adoptees -- which means that many studies with overlapping assessments are available for meta-analysis. It is widely accepted that heritability is caused by many genes of small effect ( Finding 3 ) and that one way to increase power to detect such small effects is to increase sample size by creating consortia for purposes of meta-analysis of summary statistics and, increasingly, mega-analysis of raw data ( Gelernter, 2015 ).

In relation to other fields, the recent turmoil concerning false positive findings has led to top-down recommendations for changing the incentives for replication and meta-analysis ( Nosek, Spies, & Motyl, 2012 ; Stanley & Spence, 2014 ). Although these top-down recommendations are important, bottom-up approaches to collaboration and meta-analysis that coincide with researchers’ own needs – in this case, the need to achieve greater power to detect smaller effects – are likely to be practical and powerful incentives.

Genetic effect sizes are large

The most important reason for the reproducibility of behavioral genetic results is that genetic effect sizes are large. Heritabilities for behavioral traits, typically 30% - 50%, are by far the largest effect sizes in psychological science. What other findings in psychological science account for 5% of the variance, let alone 50%? Consider sex differences as one of countless examples. Although thousands of papers report significant sex differences in psychological traits, a general rule is that sex differences account for less than 1% of the variance ( Hyde, 2014 ).

In retrospect, it is amazing that inherited DNA differences can work their way through the complexities of pathways from genes to brain to behavior and end up accounting for so much of the variance of complex psychological traits. These large heritabilities were lucky for behavioral genetics because earlier studies would have been underpowered to detect more modest heritabilities. As an extreme example, heritabilities of 5% would require twin samples in the tens of thousands to reach 80% power to detect them ( Visscher, 2004 ; http://genepi.qimr.edu.au/general/TwinPowerCalculator/ ).

Conclusions

Discovering such big and often counterintuitive findings is a cause for celebration in psychology, especially coming from behavioral genetics, which has been so controversial during the past century. These findings have begun to change the received psychological perspective about the origins of individual differences in behavior. During the past century, the pendulum of opinion has swung from nature to nurture and is now swinging back towards nature. We hope that this research has stopped the pendulum at a point between nature and nurture because the most basic message ( Findings 1 and 2 ) is that both genetics and environment contribute substantially to individual differences in psychological traits. It is worth noting again that four of these findings are primarily about the environment rather than genetics, which emphasizes the value of studying environmental influences in genetically sensitive designs.

What we like best about some of these findings is that they are counterintuitive. For example, who would have thought that the heritability of intelligence increases throughout development ( Finding 5 ) or that environmental measures show genetic influence ( Finding 7 ) or that the abnormal is normal ( Finding 10 )? Another feature of these findings is that each is falsifiable. For example, if major-gene effects on complex traits and common disorders are found, they would falsify the hypothesis that heritability is caused by many genes of small effect ( Finding 3 ).

We also speculated why behavioral genetic results replicate, suggesting possible reasons that are specific to behavioral genetics. For example, the controversies that permeated the field during the past century raised the bar for the quality and quantity of research needed to convince people of the importance of genetics throughout psychology. Another reason we described is that behavioral genetic research is conducive to replication for several practical reasons rather than for lofty ideals of a progressive science. However, as researchers in the field for several decades, it has been our experience that the field is imbued with an ethos of building a progressive science based on replicable findings. It is crucial to build from this firm foundation of replicable findings, and the most difficult tasks lie ahead, understanding the actual processes that mediate these replicable findings. What we have learned about the genetic and environmental architecture hints at just how difficult this will be because heritability is caused by many genes of small effect (Finding 3) and most environmental effects are not shared by children growing up in the same family (Finding 9).

Funding Acknowledgment

RP is a UK Medical Research Council Research Professor [G19/2] and European Research Council Advanced Investigator Award holder [295366].

Contributor Information

Robert Plomin, King’s College London.

John C. DeFries, University of Colorado.

Valerie S. Knopik, Rhode Island Hospital and Brown University.

Jenae M. Neiderhiser, The Pennsylvania State University.

REFERENCE LIST

  • Avinun R, Knafo A. Parenting as a reaction evoked by children's genotype: A meta-analysis of children-as-twins studies. Personality and Social Psychology Review. 2014; 18 :87–102. doi: 10.1177/1088868313498308. [ PubMed ] [ Google Scholar ]
  • Bakker M, van Dijk A, Wicherts JM. The rules of the game called psychological science. Perspectives on Psychological Science. 2012; 7 :543–554. doi: 10.1177/1745691612459060. [ PubMed ] [ Google Scholar ]
  • Baltes PB, Reese HW, Lipsitt LP. Life-span developmental psychology. Annual Review of Psychology. 1980; 31 :65–110. doi: 10.1146/annurev.ps.31.020180.000433. [ PubMed ] [ Google Scholar ]
  • Bartels M. Genetics of wellbeing and its components satisfaction with life, happiness, and quality of life: A review and meta-analysis of heritability studies. Behavior Genetics. 2015; 45 :137–156. doi: 10.1007/s10519-015-9713-y. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bartels M, van den Oord EJ, Hudziak JJ, Rietveld MJ, Van Beijsterveldt CE, Boomsma DI. Genetic and environmental mechanisms underlying stability and change in problem behaviors at ages 3, 7, 10, and 12. Developmental Psychology. 2004; 40 :852–867. doi: 10.1037/0012-1649.40.5.852. [ PubMed ] [ Google Scholar ]
  • Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer research. Nature. 2012; 483 (7391):531–533. doi: 10.1038/483531a. [ PubMed ] [ Google Scholar ]
  • Benyamin B, Pourcain B, Davis OS, Davies G, Hansell NK, Brion MJ, Visscher PM. Childhood intelligence is heritable, highly polygenic and associated with FNBP1L. Molecular Psychiatry. 2014; 19 :253–258. doi: 10.1038/mp.2012.184. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bergen SE, Gardner CO, Kendler KS. Age-related changes in heritability of behavioral phenotypes over adolescence and young adulthood: A meta-analysis. Twin Research and Human Genetics. 2007; 10 :423–433. doi: 10.1375/twin.10.3.423. [ PubMed ] [ Google Scholar ]
  • Boekel W, Wagenmakers EJ, Belay L, Verhagen J, Brown S, Forstmann BU. A purely confirmatory replication study of structural brain-behavior correlations. Cortex. 2015; 66 :115–133. doi: 10.1016/j.cortex.2014.11.019. [ PubMed ] [ Google Scholar ]
  • Bornovalova MA, Hicks BM, Iacono WG, McGue M. Stability, change, and heritability of borderline personality disorder traits from adolescence to adulthood: A longitudinal twin study. Development and Psychopathology. 2009; 21 :1335–1353. doi: 10.1017/s0954579409990186. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bouchard TJ, Jr., McGue M. Familial studies of intelligence: A review. Science. 1981; 212 (4498):1055–1059. doi: 10.1126/science.7195071. [ PubMed ] [ Google Scholar ]
  • Briley DA, Tucker-Drob EM. Explaining the increasing heritability of cognitive ability across development: A meta-analysis of longitudinal twin and adoption studies. Psychological Science. 2013; 24 :1704–1713. doi: 10.1177/0956797613478618. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Briley DA, Tucker-Drob EM. Genetic and environmental continuity in personality development: A meta-analysis. Psychological Bulletin. 2014; 140 :1303–1331. doi: 10.1037/a0037091. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Buchanan JP, McGue M, Keyes M, Iacono WG. Are there shared environmental influences on adolescent behavior? Evidence from a study of adoptive siblings. Behavior Genetics. 2009; 39 :532–540. doi: 10.1007/s10519-009-9283-y. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Burks B. The relative influence of nature and nurture upon mental development: A comparative study on foster parent-foster child resemblance. Yearbook of the National Society for the Study of Education, Part 1. 1928; 27 :219–316. [ Google Scholar ]
  • Burt SA. Rethinking environmental contributions to child and adolescent psychopathology: A meta-analysis of shared environmental influences. Psychological Bulletin. 2009; 135 :608–637. doi: 10.1037/a001570. [ PubMed ] [ Google Scholar ]
  • Burt SA. Research review: The shared environment as a key source of variability in child and adolescent psychopathology. Journal of Child Psychology and Psychiatry. 2014; 55 :304–312. doi: 10.1111/jcpp.12173. [ PubMed ] [ Google Scholar ]
  • Burt SA, McGue M, Carter LA, Iacono WG. The different origins of stability and change in antisocial personality disorder symptoms. Psychological Medicine. 2007; 37 :27–38. doi: 10.1017/S0033291706009020. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Burt SA, McGue M, Krueger RF, Iacono WG. How are parent-child conflict and childhood externalizing symptoms related over time? Results from a genetically informative cross-lagged study. Development and Psychopathology. 2005; 17 :145–165. doi: 10.1017/S095457940505008X. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, Munafo MR. Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience. 2013; 14 :365–376. doi: 10.1038/nrn3475. [ PubMed ] [ Google Scholar ]
  • Cardno AG, Rijsdijk FV, West RM, Gottesman II, Craddock N, Murray RM, McGuffin P. A twin study of schizoaffective-mania, schizoaffective-depression, and other psychotic syndromes. American Journal of Medical Genetics. B: Neuropsychiatric Genetics. 2012; 159 :172–182. doi: 10.1002/ajmg.b.32011. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Cerda M, Sagdeo A, Johnson J, Galea S. Genetic and environmental influences on psychiatric comorbidity: A systematic review. Journal of Affective Disorders. 2010; 126 :14–38. doi: 10.1016/j.jad.2009.11.006. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Chabris CF, Hebert BM, Benjamin DJ, Beauchamp J, Cesarini D, van der Loos M, Laibson D. Most reported genetic associations with general intelligence are probably false positives. Psychological Science. 2012; 23 :1314–1323. doi: 10.1177/0956797611435528. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Chabris C, Lee J, Cesarini D, Benjamin D, Laibson D. The fourth law of behavior genetics. Current Directions in Psychological Science. 2015; 24 :304–312. doi: 10.1177/0963721415580430. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Check Hayden E. Ethics: Taboo genetics. Nature. 2013; 502 :26–28. doi: 10.1038/502026a. [ PubMed ] [ Google Scholar ]
  • Chipuer HM, Rovine MJ, Plomin R. LISREL modeling: Genetic and environmental influences on IQ revisited. Intelligence. 1990; 14 :11–29. doi: 10.1016/0160-2896(90)90011-H. [ Google Scholar ]
  • Cross-Disorder Group of the Psychiatric Genomics Consortium Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nature Genetics. 2013a; 45 :984–994. doi: 10.1038/ng.2711. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Cross-Disorder Group of the Psychiatric Genomics Consortium Identification of risk loci with shared effects on five major psychiatric disorders: A genome-wide analysis. Lancet. 2013b; 381 (9875):1371–1379. doi: 10.1016/s0140-6736(12)62129-1. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Cumming G. The new statistics: Why and how. Psychological Science. 2014; 25 :7–29. doi: 10.1177/0956797613504966. [ PubMed ] [ Google Scholar ]
  • Davies G, Armstrong N, Bis JC, Bressler J, Chouraki V, Giddaluru S, Deary IJ. Genetic contributions to variation in general cognitive function: A meta-analysis of genome-wide association studies in the CHARGE consortium (N=53 949) Molecular Psychiatry. 2015; 20 :183–192. doi: 10.1038/mp.2014.188. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Davis LK, Yu D, Keenan CL, Gamazon ER, Konkashbaev AI, Derks EM, Scharf JM. Partitioning the heritability of Tourette syndrome and obsessive compulsive disorder reveals differences in genetic architecture. PLoS Genetics. 2013; 9 :e1003864. doi: 10.1371/journal.pgen.1003864. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Davis OSP, Haworth CMA, Plomin R. Learning abilities and disabilities: Generalist genes in early adolescence. Cognitive Neuropsychiatry. 2009; 14 :312–331. doi: 10.1080/13546800902797106. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • DeFries JC, Fulker DW. Multiple regression analysis of twin data. Behavior Genetics. 1985; 15 :467–473. doi: 10.1007/BF01066239. [ PubMed ] [ Google Scholar ]
  • DeFries JC, Fulker DW. Multiple regression analysis of twin data: Etiology of deviant scores versus individual differences. Acta Geneticae Medicae et Gemellologiae. 1988; 37 :205–216. doi: 10.1017/S0001566000003810. [ PubMed ] [ Google Scholar ]
  • DeFries JC, Fulker DW, LaBuda MC. Evidence for a genetic aetiology in reading disability of twins. Nature. 1987; 329 (6139):537–539. doi: 10.1038/329537a0. [ PubMed ] [ Google Scholar ]
  • DeFries JC, Gervais MC, Thomas EA. Response to 30 generations of selection for open-field activity in laboratory mice. Behavior Genetics. 1978; 8 :3–13. doi: 10.1007/BF01067700. [ PubMed ] [ Google Scholar ]
  • Devlin B, Daniels M, Roeder K. The heritability of IQ. Nature. 1997; 388 (6641):468–471. doi: 10.1038/41319. [ PubMed ] [ Google Scholar ]
  • Doherty JL, Owen MJ. Genomic insights into the overlap between psychiatric disorders: implications for research and clinical practice. Genome Medicine. 2014; 6 :29. doi: 10.1186/gm546. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Farrell MS, Werge T, Sklar P, Owen MJ, Ophoff RA, O'Donovan MC, Sullivan PF. Evaluating historical candidate genes for schizophrenia. Molecular Psychiatry. 2015; 20 :555–562. doi: 10.1038/mp.2015.16. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Fisher RA. The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh. 1918; 52 :399–433. doi: 10.1017/S0080456800012163. [ Google Scholar ]
  • Galton F. Hereditary genius: An enquiry into its laws and consequences. World; Cleveland, OH: 1869. [ Google Scholar ]
  • Gaugler T, Klei L, Sanders SJ, Bodea CA, Goldberg AP, Lee AB, Buxbaum JD. Most genetic risk for autism resides with common variation. Nature Genetics. 2014; 46 :881–885. doi: 10.1038/ng.3039. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Gelernter J. Genetics of complex traits in psychiatry. Biological Psychiatry. 2015; 77 :36–42. doi: 10.1016/j.biopsych.2014.08.005. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Gratten J, Wray NR, Keller MC, Visscher PM. Large-scale genomics unveils the genetic architecture of psychiatric disorders. Nature Neuroscience. 2014; 17 :782–790. doi: 10.1038/nn.3708. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Groen-Blokhuis MM, Middeldorp CM, Kan K-J, Abdellaoui A, van Beijsterveldt CEM, Ehli EA, Boomsma DI. Attention-deficit/hyperactivity disorder polygenic risk scores predict attention problems in a population-based sample of children. Journal of the American Academy of Child and Adolescent Psychiatry. 2014; 53 :1123–1129. doi: 10.1016/j.jaac.2014.06.014. [ PubMed ] [ Google Scholar ]
  • Haworth CMA, Plomin R. Genetics and education: Towards a genetically sensitive classroom. In: Harris KR, Graham S, Urdan T, editors. The American Psychological Association Handbook of Educational Psychology. APA; Washington, DC: 2011. pp. 529–559. [ Google Scholar ]
  • Haworth CMA, Wright MJ, Luciano M, Martin NG, de Geus EJC, van Beijsterveldt CEM, Plomin R. The heritability of general cognitive ability increases linearly from childhood to young adulthood. Molecular Psychiatry. 2010; 15 :1112–1120. doi: 10.1038/mp.2009.55. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hoekstra RA, Bartels M, Hudziak JJ, Van Beijsterveldt TC, Boomsma DI. Genetic and environmental influences on the stability of withdrawn behavior in children: A longitudinal, multi-informant twin study. Behavior Genetics. 2008; 38 :447–461. doi: 10.1007/s10519-008-9213-4. [ PubMed ] [ Google Scholar ]
  • Huppertz C, Bartels M, Jansen IE, Boomsma DI, Willemsen G, de Moor MHM, de Geus EJC. A twin-sibling study on the relationship between exercise attitudes and exercise behavior. Behavior Genetics. 2014; 44 :45–55. doi: 10.1007/s10519-013-9617-7. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hur YM, Craig JM. Twin registries worldwide: An important resource for scientific research. Twin Research and Human Genetics. 2013; 16 :1–12. doi: 10.1017/thg.2012.147. [ PubMed ] [ Google Scholar ]
  • Hyde JS. Gender similarities and differences. Annual Review of Psychology. 2014; 65 :373–398. doi: 10.1146/annurev-psych-010213-115057. [ PubMed ] [ Google Scholar ]
  • Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, Wang P. Research domain criteria (RDoC): Toward a new classification framework for research on mental disorders. American Journal of Psychiatry. 2010; 167 :748–751. doi: 10.1176/appi.ajp.2010.09091379. [ PubMed ] [ Google Scholar ]
  • Ioannidis JPA. Contradicted and initially stronger effects in highly cited clinical research. JAMA. 2005a; 294 :218–228. doi: 10.1001/jama.294.2.218. [ PubMed ] [ Google Scholar ]
  • Ioannidis JPA. Why most published research findings are false. PLoS Medicine. 2005b; 2 :e124. doi: 10.1371/journal.pmed.0020124. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ioannidis JPA. How to make more published research true. PLoS Medicine. 2014; 11 :e1001747. doi: 10.1371/journal.pmed.1001747. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ioannidis JP, Munafo MR, Fusar-Poli P, Nosek BA, David SP. Publication and other reporting biases in cognitive sciences: Detection, prevalence, and prevention. Trends in Cognitive Science. 2014; 18 :235–241. doi: 10.1016/j.tics.2014.02.010. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Jasny BR, Chin G, Chong L, Vignieri S. Data replication & reproducibility. Again, and again, and again ... Introduction. Science. 2011; 334 (6060):1225. doi: 10.1126/science.334.6060.1225. [ PubMed ] [ Google Scholar ]
  • Kavanagh DH, Tansey KE, O'Donovan MC, Owen MJ. Schizophrenia genetics: Emerging themes for a complex disorder. Molecular Psychiatry. 2015; 20 :72–76. doi: 10.1038/mp.2014.148. [ PubMed ] [ Google Scholar ]
  • Kendler KS, Aggen SH, Knudsen GP, Røysamb E, Neale MC, Reichborn-Kjennerud T. The structure of genetic and environmental risk factors for syndromal and subsyndromal common DSM-IV Axis I and all Axis II disorders. American Journal of Psychiatry. 2011; 168 :29–39. doi: doi:10.1176/appi.ajp.2010.10030340. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kendler KS, Baker JH. Genetic influences on measures of the environment: A systematic review. Psychological Medicine. 2007; 37 :615–626. doi: 10.1017/S0033291706009524. [ PubMed ] [ Google Scholar ]
  • Kendler KS, Gardner CO, Lichtenstein P. A developmental twin study of symptoms of anxiety and depression: Evidence for genetic innovation and attenuation. Psychological Medicine. 2008; 38 :1567–1575. doi: 10.1017/s003329170800384x. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kendler KS, Gatz M, Gardner CO, Pedersen NL. Personality and major depression: A Swedish longitudinal, population-based twin study. Archives of General Psychiatry. 2006; 63 :1113–1120. [ PubMed ] [ Google Scholar ]
  • Kendler KS, Prescott CA, Myers J, Neale MC. The structure of genetic and environmental risk factors for common psychiatric and substance use disorders in men and women. Archives of General Psychiatry. 2003; 60 :929–937. doi: 10.1001/archpsyc.60.9.929. [ PubMed ] [ Google Scholar ]
  • Klahr AM, Burt SA. Elucidating the etiology of individual differences in parenting: A meta-analysis of behavioral genetic research. Psychological Bulletin. 2014; 140 :544–586. doi: 10.1037/a0034205. [ PubMed ] [ Google Scholar ]
  • Klei L, Sanders SJ, Murtha MT, Hus V, Lowe JK, Willsey AJ, Devlin B. Common genetic variants, acting additively, are a major source of risk for autism. Molecular Autism. 2012; 3 :9. doi: 10.1186/2040-2392-3-9. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Knopik VS, Heath AC, Jacob T, Slutske WS, Bucholz KK, Madden PAF, Martin NG. Maternal alcohol use disorder and offspring ADHD: Disentangling genetic and environmental effects using a children-of-twins design. Psychological Medicine. 2006; 36 :1461–1471. doi: 10.1017/s0033291706007884. [ PubMed ] [ Google Scholar ]
  • Kovas Y, Haworth CMA, Dale PS, Plomin R. The genetic and environmental origins of learning abilities and disabilities in the early school years. Monographs of the Society for Research in Child Development. 2007; 72 :1–144. doi: 10.1111/j.1540-5834.2007.00453.x. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Krapohl E, Euesden J, Zababneh D, Pingault J-B, Rimfeld K, von Stumm S, Dale PS, Breen G, O’Reilly PF, Plomin R. Phenome-wide analysis of genome-wide polygenic scores. Molecular Psychiatry. 2015 Advance online publication. Doi: 10.1038/mp2015.126. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Krapohl E, Plomin R. Genetic link between family socioeconomic status and children's educational achievement estimated from genome-wide SNPs. Molecular Psychiatry. 2015 Advance online publication. doi: 10.1038/mp.2015.2. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Leahy AM. Nature-nurture and intelligence. Genetic Psychology Monographs. 1935; 17 :236–308. [ Google Scholar ]
  • Ledgerwood A. Introduction to the special section on advancing our methods and practices. Perspectives on Psychological Science. 2014a; 9 :275–277. doi: 10.1177/1745691614529448. [ PubMed ] [ Google Scholar ]
  • Ledgerwood A. Introduction to the special section on moving toward a cumulative science: Maximizing what our research can tell us. Perspectives on Psychological Science. 2014b; 9 :610–611. doi: 10.1177/1745691614553989. [ PubMed ] [ Google Scholar ]
  • Lee T, Henry JD, Trollor JN, Sachdev PS. Genetic influences on cognitive functions in the elderly: A selective review of twin studies. Brain Research Reviews. 2010; 64 :1–13. doi: 10.1016/j.brainresrev.2010.02.001. [ PubMed ] [ Google Scholar ]
  • Lichtenstein P, Yip BH, Bjork C, Pawitan Y, Cannon TD, Sullivan PF, Hultman CM. Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: A population-based study. Lancet. 2009; 373 (9659):234–239. doi: 10.1016/s0140-6736(09)60072-6. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Loehlin JC. Partitioning environmental and genetic contributions to behavioral development. American Psychologist. 1989; 44 :1285–1292. doi: 10.1037/0003-066X.44.10.1285. [ PubMed ] [ Google Scholar ]
  • Loehlin JC. Genes and environment in personality development. Sage Publications Inc.; Newbury Park, CA: 1992. [ Google Scholar ]
  • Loehlin JC, Nichols J. Heredity, environment and personality: A study of 850 sets of twins. University of Texas; Austin, TX: 1976. [ Google Scholar ]
  • Lubke GH, Hottenga JJ, Walters R, Laurin C, de Geus EJ, Willemsen G, Boomsma DI. Estimating the genetic variance of major depressive disorder due to all single nucleotide polymorphisms. Biological Psychiatry. 2012; 72 :707–709. doi: 10.1016/j.biopsych.2012.03.011. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lubke GH, Laurin C, Amin N, Hottenga JJ, Willemsen G, van Grootheest G, Boomsma DI. Genome-wide analyses of borderline personality features. Molecular Psychiatry. 2014; 19 :923–929. doi: 10.1038/mp.2013.109. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Macleod MR, Michie S, Roberts I, Dirnagl U, Chalmers I, Ioannidis JPA, Glasziou P. Biomedical research: Increasing value, reducing waste. The Lancet. 2014; 383 (9912):101–104. doi: 10.1016/S0140-6736(13)62329-6. [ PubMed ] [ Google Scholar ]
  • Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, Visscher PM. Finding the missing heritability of complex diseases. Nature. 2009; 461 (7265):747–753. doi: 10.1038/nature08494. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Marceau K, Narusyte J, Lichtenstein P, Ganiban JM, Spotts EL, Reiss D, Neiderhiser JM. Parental knowledge is an environmental influence on adolescent externalizing. Journal of Child Psychology and Psychiatry. 2015; 56 :130–137. doi: 10.1111/jcpp.12288. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Marioni RE, Davies G, Hayward C, Liewald D, Kerr SM, Campbell A, Deary IJ. Molecular genetic contributions to socioeconomic status and intelligence. Behavior Genetics. 2014; 44 :26–32. doi: 10.1016/j.intell.2014.02.006. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Marszalek JM, Barber C, Kohlhart J, Holmes CB. Sample size in psychological research over the past 30 years. Perceptual and Motor Skills. 2011; 112 :331–348. doi: 10.2466/03.11.pms.112.2.331-348. [ PubMed ] [ Google Scholar ]
  • Martin J, Hamshere ML, Stergiakouli E, O’Donovan MC, Thapar A. Genetic risk for attention-deficit/hyperactivity disorder contributes to neurodevelopmental traits in the general population. Biological Psychiatry. 2014; 76 :664–671. doi: 10.1016/j.biopsych.2014.02.013. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Matteson LK, McGue M, Iacono WG. Shared environmental influences on personality: A combined twin and adoption approach. Behavior Genetics. 2013; 43 :491–504. doi: 10.1007/s10519-013-9616-8. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • McAdams TA, Gregory AM, Eley TC. Genes of experience: Explaining the heritability of putative environmental variables through their association with behavioural and emotional traits. Behavior Genetics. 2013; 43 :314–328. doi: 10.1007/s10519-013-9591-0. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • McAdams TA, Neiderhiser JM, Rijsdijk FV, Narusyte J, Lichtenstein P, Eley TC. Accounting for genetic and environmental confounds in associations between parent and child characteristics: A systematic review of children-of-twins studies. Psychological Bulletin. 2014; 140 :1138–1173. doi: 10.1037/a0036416. [ PubMed ] [ Google Scholar ]
  • McGue M, Bacon S, Lykken DT. Personality stability and change in early adulthood: A behavioral genetic analysis. Developmental Psychology. 1993; 29 :96–109. doi: 10.1037/0012-1649.29.1.96. [ Google Scholar ]
  • McGue M, Bouchard TJ, Jr., Iacono WG, Lykken DT. Behavioral genetics of cognitive ability: A life-span perspective. In: Plomin R, McClearn GE, editors. Nature, nurture, and psychology. American Psychological Association; Washington, DC: 1993. pp. 59–76. [ Google Scholar ]
  • McGue M, Christensen K. Growing old but not growing apart: Twin similarity in the latter half of the lifespan. Behavior Genetics. 2013; 43 :1–12. doi: 10.1007/s10519-012-9559-5. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • McGue M, Zhang Y, Miller MB, Basu S, Vrieze S, Hicks B, Iacono WG. A genome-wide association study of behavioral disinhibition. Behavior Genetics. 2013; 43 :363–373. doi: 10.1007/s10519-013-9606-x. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Mendel GJ. Versuche ueber Pflanzenhybriden. Verhandlungen des Naturforschunden Vereines in Bruenn. 1866; 4 :3–47. [ Google Scholar ]
  • Middeldorp CM, Cath DC, Van Dyck R, Boomsma DI. The co-morbidity of anxiety and depression in the perspective of genetic epidemiology. A review of twin and family studies. Psychological Medicine. 2005; 35 :611–624. doi: 10.1017/S003329170400412. [ PubMed ] [ Google Scholar ]
  • Narusyte J, Neiderhiser JM, D'Onofrio BM, Reiss D, Spotts EL, Ganiban J, Lichtenstein P. Testing different types of genotype-environment correlation: An extended children-of-twins model. Developmental Psychology. 2008; 44 :1591–1603. doi: 10.1037/a0013911. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Neiderhiser JM, Reiss D, Hetherington EM, Plomin R. Relationships between parenting and adolescent adjustment over time: Genetic and environmental contributions. Developmental Psychology. 1999; 35 :680–692. doi: 10.1037/0012-1649.35.3.680. [ PubMed ] [ Google Scholar ]
  • Nosek BA, Spies JR, Motyl M. Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science. 2012; 7 :615–631. doi: 10.1177/1745691612459058. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Open Science Collaboration Estimating the reproducibility of psychological science. Science. 2015; 349 :aac4716. doi: 10.1126/science.aac4716. [ PubMed ] [ Google Scholar ]
  • Palmer RHC, Brick L, Nugent NR, Bidwell LC, McGeary JE, Knopik VS, Keller MC. Examining the role of common genetic variants on alcohol, tobacco, cannabis and illicit drug dependence: Genetics of vulnerability to drug dependence. Addiction. 2015; 110 :530–537. doi: 10.1111/add.12815. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Panizzon MS, Vuoksimaa E, Spoon KM, Jacobson KC, Lyons MJ, Franz CE, Kremen WS. Genetic and environmental influences on general cognitive ability: Is g a valid latent construct? Intelligence. 2014; 43 :65–76. doi: 10.1016/j.intell.2014.01.008. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Pashler H, Wagenmakers E–J. Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence? Perspectives on Psychological Science. 2012; 7 :528–530. doi: 10.1177/1745691612465253. [ PubMed ] [ Google Scholar ]
  • Petrill SA. Molarity versus modularity of cognitive functioning? A behavioral genetic perspective. Current Directions in Psychological Science. 1997; 6 :96–99. doi: 10.1111/1467-8721.ep11512833. [ Google Scholar ]
  • Pike A, McGuire S, Hetherington EM, Reiss D, Plomin R. Family environment and adolescent depressive symptoms and antisocial behavior: A multivariate genetic analysis. Developmental Psychology. 1996; 32 :590–603. doi: 10.1037/0012-1649.32.4.590. [ Google Scholar ]
  • Pinker S. The blank slate: The modern denial of human nature. Penguin; New York: 2002. [ Google Scholar ]
  • Plomin R. Development, genetics, and psychology. Erlbaum; Hillsdale, NJ: 1986. [ Google Scholar ]
  • Plomin R. Environment and genes. Determinants of behavior. American Psychologist. 1989; 44 :105–111. doi: 10.1037/0003-066X.44.2.105. [ PubMed ] [ Google Scholar ]
  • Plomin R. Genetics and experience: The interplay between nature and nurture. Sage Publications Inc.; Thousand Oaks, CA: 1994. [ Google Scholar ]
  • Plomin R. Commentary: Why are children in the same family so different? Non-shared environment three decades later. International Journal of Epidemiology. 2011; 40 :582–592. doi: 10.1093/ije/dyq144. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Plomin R, Bergeman CS. The nature of nurture: Genetic influence on “environmental” measures. (With open peer commentary) Behavioral and Brain Sciences. 1991; 14 :373–414. doi: 10.1017/S0140525X00070278. [ Google Scholar ]
  • Plomin R, Daniels D. Children in the same family are very different, but why? Behavioral and Brain Sciences. 1987a; 10 :44–55. doi: 10.1017/S0140525X00056272. [ Google Scholar ]
  • Plomin R, Daniels D. Why are children in the same family so different from each other? Behavioral and Brain Sciences. 1987b; 10 :1–16. doi: 10.1017/S0140525X00055941. [ Google Scholar ]
  • Plomin R, Deary IJ. Genetics and intelligence differences: Five special findings. Molecular Psychiatry. 2015; 20 :98–108. doi: 10.1038/mp.2014.105. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Plomin R, DeFries JC. Origins of individual differences in infancy: The Colorado Adoption Project. Academic Press, Inc.; Orlando, FL: 1985. [ Google Scholar ]
  • Plomin R, DeFries JC, Knopik VS, Neiderhiser JM. Behavioral genetics . 6th ed. Worth Publishers; New York: 2013. [ Google Scholar ]
  • Plomin R, Haworth CMA, Davis OSP. Common disorders are quantitative traits. Nature Reviews Genetics. 2009; 10 :872–878. doi: 10.1038/nrg2670. [ PubMed ] [ Google Scholar ]
  • Plomin R, Kovas Y. Generalist genes and learning disabilities. Psychological Bulletin. 2005; 131 :592–617. doi: 10.1037/0033-2909.131.4.592. [ PubMed ] [ Google Scholar ]
  • Plomin R, Loehlin JC, DeFries JC. Genetic and environmental components of “environmental” influences. Developmental Psychology. 1985; 21 :391–402. doi: 10.1037/0012-1649.21.3.391. [ Google Scholar ]
  • Plomin R, Simpson MA. The future of genomics for developmentalists. Development and Psychopathology. 2013; 25 :1263–1278. doi: 10.1017/S0954579413000606. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Polderman TJC, Benyamin B, de Leeuw CA, Sullivan PF, van Bochoven A, Visscher PM, Posthuma D. Nature Genetics. 2015 Advance online publication. doi: 10.1038/ng.3285. [ PubMed ] [ Google Scholar ]
  • Power RA, Wingenbach T, Cohen-Woods S, Uher R, Ng MY, Butler AW, McGuffin P. Estimating the heritability of reporting stressful life events captured by common genetic variants. Psychological Medicine. 2013; 43 :1965–1971. doi: 10.1017/S0033291712002589. [ PubMed ] [ Google Scholar ]
  • Prinz F, Schlange T, Asadullah K. Believe it or not: How much can we rely on published data on potential drug targets? Nature Reviews Drug Discovery. 2011; 10 :712. [ PubMed ] [ Google Scholar ]
  • Reichenberg A, Cederlöf M, McMillan A, Trzaskowski M, Davidson M, Weiser M, Lichtenstein P. Discontinuity in the genetic and environmental causes of the intellectual disability spectrum. Proceedings of the National Academy of Sciences USA. (in press) [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Rhee SH, Lahey BB, Waldman ID. Comorbidity among dimensions of childhood psychopathology: Converging evidence from behavior genetics. Child Development Perspectives. 2015; 9 :26–31. doi: 10.1111/cdep.12102. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Rhee SH, Waldman ID. Genetic and environmental influences on antisocial behavior: A meta-analysis of twin and adoption studies. Psychological Bulletin. 2002; 128 :490–529. doi: 10.1037/0033-2909.128.3.490. [ PubMed ] [ Google Scholar ]
  • Rietveld CA, Cesarini D, Benjamin DJ, Koellinger PD, De Neve JE, Tiemeier H, Bartels M. Molecular genetics and subjective well-being. Proceedings of the National Academy of Sciences (USA) 2013; 110 :9692–9697. doi: 10.1073/pnas.1222171110. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Rietveld CA, Conley D, Eriksson N, Esko T, Medland SE, Vinkhuyzen AAE, The Social Science Genetics Association Consortium Replicability and robustness of genome-wide-association studies for behavioral traits. Psychological Science. 2014; 25 :1975–1986. doi: 10.1177/0956797614545132. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Rietveld CA, Medland SE, Derringer J, Yang J, Esko T, Martin NW, Koellinger PD. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science. 2013; 340 (6139):1467–1471. doi: 10.1126/science.1235488. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Rietveld MJ, Hudziak JJ, Bartels M, Van Beijsterveldt CE, Boomsma DI. Heritability of attention problems in children: Longitudinal results from a study of twins, age 3 to 12. Journal of Child Psychology and Psychiatry. 2004; 45 :577–588. doi: 10.1111/j.1469-7610.2004.00247.x. [ PubMed ] [ Google Scholar ]
  • Ripke S, O'Dushlaine C, Chambert K, Moran JL, Kahler AK, Akterin S, Sullivan PF. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nature Genetics. 2013; 45 :1150–1159. doi: 10.1038/ng.2742. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Robinson EB, Neale BM, Daly MJ. Epidemiology of neuropsychiatric and developmental disorders of childhood. In: Charney DS, editor. Neurobiology of mental illness. Oxford University Press; New York: 2015. pp. 993–943. [ Google Scholar ]
  • Robinson MR, Wray NR, Visscher PM. Explaining additional genetic variation in complex traits. Trends in Genetics. 2014; 30 :124–132. doi: 10.1016/j.tig.2014.02.003. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Scarr S, McCartney K. How people make their own environments: A theory of genotype greater than environmental effects. Child Development. 1983; 54 :424–435. doi: 10.2307/1129703. [ PubMed ] [ Google Scholar ]
  • Schizophrenia Working Group of the Psychiatric Genomics Consortium Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014; 511 (7510):421–427. doi: 10.1038/nature13595. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Schmidt S. Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology. 2009; 13 :90–100. doi: 10.1037/a0015108. [ Google Scholar ]
  • Shikishima C, Hiraishi K, Yamagata S, Neiderhiser JM, Ando J. Culture moderates the genetic and environmental etiologies of parenting: A cultural behavior genetic approach. Social Psychological and Personality Science. 2012; 4 :434–444. doi: 10.1177/1948550612460058. [ Google Scholar ]
  • Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science. 2011; 22 :1359–1366. doi: 10.1177/0956797611417632. [ PubMed ] [ Google Scholar ]
  • Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, Jackson AU, Loos RJ. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nature Genetics. 2010; 42 :937–948. doi: 10.1038/ng.686. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • St Pourcain B, Cents RA, Whitehouse AJ, Haworth CM, Davis OS, O'Reilly PF, Davey Smith G. Common variation near ROBO2 is associated with expressive vocabulary in infancy. Nature Communications. 2014; 5 :4831. doi: 10.1038/ncomms5831. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Stanley DJ, Spence JR. Expectations for replications: Are yours realistic? Perspectives on Psychological Science. 2014; 9 :305–318. doi: 10.1177/1745691614528518. [ PubMed ] [ Google Scholar ]
  • Stergiakouli E, Martin J, Hamshere ML, Langley K, Evans DM, St Pourcain B, Davey Smith G. Shared genetic influences between attention-deficit/hyperactivity disorder (ADHD) traits in children and clinical ADHD. Journal of the American Academy of Child and Adolescent Psychiatry. 2015; 54 :322–327. doi: 10.1016/j.jaac.2015.01.010. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Sullivan PF, Kendler KS, Neale MC. Schizophrenia as a complex trait: Evidence from a meta-analysis of twin studies. Archives of General Psychiatry. 2003; 60 :1187–1192. doi: 10.1001/archpsyc.60.12.1187. [ PubMed ] [ Google Scholar ]
  • Trzaskowski M, Davis OS, Defries JC, Yang J, Visscher PM, Plomin R. DNA evidence for strong genome-wide pleiotropy of cognitive and learning abilities. Behavior Genetics. 2013; 43 :267–273. doi: 10.1007/s10519-013-9594-x. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Trzaskowski M, Harlaar N, Arden R, Krapohl E, Rimfeld K, McMillan A, Plomin R. Genetic influence on family socioeconomic status and children's intelligence. Intelligence. 2014; 42 :83–88. doi: 10.1016/j.intell.2013.11.002. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Tucker-Drob EM, Briley DA. Continuity of genetic and environmental influences on cognition across the life span: A meta-analysis of longitudinal twin and adoption studies. Psychological Bulletin. 2014; 140 :949–979. doi: 10.1037/a0035893. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Turkheimer E. Three laws of behavior genetics and what they mean. Current Directions in Psychological Science. 2000; 9 :160–164. doi: 10.1111/1467-8721.00084. [ Google Scholar ]
  • Turkheimer E, Pettersson E, Horn EE. A phenotypic null hypothesis for the genetics of personality. Annual Review of Psychology. 2014; 65 :515–540. doi: 10.1146/annurev-psych-113011-143752. [ PubMed ] [ Google Scholar ]
  • Turkheimer E, Waldron M. Nonshared environment: A theoretical, methodological, and quantitative review. Psychological Bulletin. 2000; 126 :78–108. doi: 10.1037/0033-2909.126.1.78 van. [ PubMed ] [ Google Scholar ]
  • van Beijsterveldt CE, Bartels M, Hudziak JJ, Boomsma DI. Causes of stability of aggression from early childhood to adolescence: A longitudinal genetic analysis in Dutch twins. Behavior Genetics. 2003; 33 :591–605. doi: 10.1023/A:1025735002864. [ PubMed ] [ Google Scholar ]
  • Verweij KJH, Yang J, Lahti J, Veijola J, Hintsanen M, Pulkki-Råback L, Zietsch BP. Maintenance of genetic variation in human personality: Testing evolutionary models by estimating heritability due to common causal variants and investigating the effect of distant inbreeding. Evolution. 2012; 66 :3238–3251. doi: 10.1111/j.1558-5646.2012.01679.x. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Vinkhuyzen AAE, Pedersen NL, Yang J, Lee SH, Magnusson PKE, Iacono WG, Wray NR. Common SNPs explain some of the variation in the personality dimensions of neuroticism and extraversion. Translational Psychiatry. 2012; 2 :e102. doi: 10.1038/tp.2012.27. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Vinkhuyzen AA, Wray NR, Yang J, Goddard ME, Visscher PM. Estimation and partition of heritability in human populations using whole-genome analysis methods. Annual Review of Genetics. 2013; 47 :75–95. doi: 10.1146/annurev-genet-111212-133258. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Visscher PM. Power of the classical twin design revisited. Twin Research. 2004; 7 :505–512. doi: 10.1375/1369052042335250. [ PubMed ] [ Google Scholar ]
  • Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. American Journal of Human Genetics. 2012; 90 :7–24. doi: 10.1016/j.ajhg.2011.11.029. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Vrieze SI, McGue M, Miller MB, Hicks BM, Iacono WG. Three mutually informative ways to understand the genetic relationships among behavioral disinhibition, alcohol use, drug use, nicotine use/dependence, and their co-occurrence: Twin biometry, GCTA, and genome-wide scoring. Behavior Genetics. 2013; 43 :97–107. doi: 10.1007/s10519-013-9584-z. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Wray NR, Lee SH, Mehta D, Vinkhuyzen AA, Dudbridge F, Middeldorp CM. Research review: Polygenic methods and their application to psychiatric traits. Journal of Child Psychology and Psychiatry. 2014; 55 :1068–1087. doi: 10.1111/jcpp.12295. [ PubMed ] [ Google Scholar ]
  • Yang JA, Lee SH, Goddard ME, Visscher PM. GCTA: A tool for genome-wide complex trait analysis. American Journal of Human Genetics. 2011; 88 :76–82. doi: 10.1016/j.ajhg.2010.11.011. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Yang J, Lee T, Kim J, Cho M-C, Han B-G, Lee J-Y, Kim H. Ubiquitous polygenicity of human complex traits: Genome-wide analysis of 49 traits in Koreans. PLoS Genetics. 2013; 9 :e1003355. doi: 10.1371/journal.pgen.1003355. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Zavos HM, Freeman D, Haworth CM, McGuire P, Plomin R, Cardno AG, Ronald A. Consistent etiology of severe, frequent psychotic experiences and milder, less frequent manifestations: A twin study of specific psychotic experiences in adolescence. JAMA Psychiatry. 2014; 71 :1049–1057. doi: 10.1001/jamapsychiatry.2014.994. [ PMC free article ] [ PubMed ] [ Google Scholar ]

Grandmother, mother and daughter smiling and laughing on a beach

Working together, we can reimagine medicine to improve and extend people’s lives.

Analytical Expert (ARD) (m/f/d)

About the role.

Major accountabilities:

  • Designing, planning, supporting the execution as well as interpreting and reporting results of scientific experiments for the development and timely supply of drug substances (DS) and drug products (DP) intended for clinical use in late stage development and potential commercialization.
  • Writing & reviewing analytical documents (e.g Analytical procedures, Specifications, Product characterization reports, Validation protocols/reports, Stability protocols/reports as well as Batch records compilation and line function material disposition for stability and release testing) and aligning the corresponding activities within a global project team.
  • Managing interactions with internal and external stakeholders, including outsourced activities to CROs by providing scientific and technical guidance whenever necessary.
  • Proactively identifying scientific, technological and GMP challenges, propose creative solutions and communicate key issues to the appropriate management level and respective technical project team.
  • Working according to appropriate SOPs, GMP, Quality Directives, Health and Safety & internal Novartis guidelines.

Minimum Requirements:

  • Minimum: Bachelor in analytical chemistry or equivalent with significant experience in analytical development of drugs. Desirable: Advanced degree in a relevant life science scientific area (e.g. Master, Ph.D. or equivalent in chemistry / pharmaceutical or analytical science).
  • Preferably 5 years’ experience in the pharmaceutical industry with a track record in GMP activities for development or marketed products.
  • Broad scientific knowledge in chemistry, pharmaceutical or analytical sciences, ability to perform in a global and highly dynamic environment.
  • Advanced knowledge of analytical techniques and associated processes (e.g. HPLC and corresponding Chromatographic Data System, Dissolution rate, Quality management systems, statistical evaluation tools ...).
  • Good presentation skills and scientific/technical writing skills and associated IT Tools.
  • Fluent in English (oral and writing), German is advantageous.

Why Novartis? Our purpose is to reimagine medicine to improve and extend people’s lives and our vision is to become the most valued and trusted medicines company in the world. How can we achieve this? With our people. It is our associates that drive us each day to reach our ambitions. Be a part of this mission and join us! Learn more here: https://www.novartis.com/about/strategy/people-and-culture You’ll receive: You can find everything you need to know about our benefits and rewards in the Novartis Life Handbook. https://www.novartis.com/careers/benefits-rewards Commitment to Diversity and Inclusion: Novartis is committed to building an outstanding, inclusive work environment and diverse teams' representative of the patients and communities we serve. Accessibility and accommodation Novartis is committed to working with and providing reasonable accommodation to all individuals. If, because of a medical condition or disability, you need a reasonable accommodation for any part of the recruitment process, or in order to receive more detailed information about the essential functions of a position, please send an e-mail to inclusion.switzerland@novartis.com and let us know the nature of your request and your contact information. Please include the job requisition number in your message. Join our Novartis Network: If this role is not suitable to your experience or career goals but you wish to stay connected to hear more about Novartis and our career opportunities, join the Novartis Network here: https://talentnetwork.novartis.com/network

Why Novartis: Helping people with disease and their families takes more than innovative science. It takes a community of smart, passionate people like you. Collaborating, supporting and inspiring each other. Combining to achieve breakthroughs that change patients’ lives. Ready to create a brighter future together? https://www.novartis.com/about/strategy/people-and-culture

Join our Novartis Network: Not the right Novartis role for you? Sign up to our talent community to stay connected and learn about suitable career opportunities as soon as they come up: https://talentnetwork.novartis.com/network

Benefits and Rewards: Read our handbook to learn about all the ways we’ll help you thrive personally and professionally: https://www.novartis.com/careers/benefits-rewards

Novartis is committed to building an outstanding, inclusive work environment and diverse teams' representative of the patients and communities we serve.

A female Novartis scientist wearing a white lab coat and glasses, smiles in front of laboratory equipment.

  • Research Article
  • Open access
  • Published: 29 August 2024

Human-water interactions associated to cercarial emergence pattern and their influences on urinary schistosomiasis transmission in two endemic areas in Mali

  • Bakary Sidibé 1 ,
  • Privat Agniwo 1 , 2 ,
  • Assitan Diakité 1 ,
  • Boris Agossou Eyaton-olodji Sègnito Savassi 2 , 3 ,
  • Safiatou Niaré Doumbo 1 ,
  • Ahristode Akplogan 1 ,
  • Hassim Guindo 1 ,
  • Moudachirou Ibikounlé 2 ,
  • Laurent Dembélé 1 ,
  • Abdoulaye Djimde 1 ,
  • Jérôme Boissier 3 &
  • Abdoulaye Dabo 1  

Infectious Diseases of Poverty volume  13 , Article number:  62 ( 2024 ) Cite this article

Metrics details

Mali is known to be a schistosomiasis-endemic country with a limited supply of clean water. This has forced many communities to rely on open freshwater bodies for many human-water contact (HWC) activities. However, the relationship between contact with these water systems and the level of schistosome infection is currently receiving limited attention. This study assessed human-water interactions including cercarial emergence pattern and their influences on urinary schistosomiasis transmission in two communities in the Kayes district of Mali.

We carried out a parasitological study first in children in September 2021, then a cross-sectional study of quantitative observations of human-water contact activities in the population, and finally a study of snail infectivity at contact points in September 2022. The study took place in two communities, Fangouné Bamanan and Diakalèl in the Kayes region of western Mali. The chronobiological study focused on cercarial release from naturally infected snails. Released cercariae were molecularly genotyped by targeting the cox1 region, and the ITS and 18S ribosmal DNA gene (18S rDNA) regions of the DNA. Links between sociodemographic parameters, human water-contact points and hematuria were established using multivariate statistical analysis or the logistic regression model.

The main factor predisposing the 97 participants to water contact was domestic activity (62.9%). Of the 378 snails collected at 14 sampling sites, 27 (7.1%) excreted schistosome cercariae, with 15.0% (19/126) at Fangouné Bamanan and 3.3% (8/252) at Diakalel. The release of Schistosoma cercariae shows three different patterns in Fangouné Bamanan: (i) an early release peak (6:00–8:00 AM), (ii) a mid-day release peak (10:00 AM–12:00 PM) and (iii) a double peak: (6:00–8:00 AM) and (6:00–8:00 PM) cercariae release; and two release patterns in Diakalel: early release (6:00–8:00 AM) and (ii) mid-day release (12:00–2:00 PM). All cercariae released during early diurnal (6:00–8:00 AM) or nocturnal emission patterns (6:00–8:00 PM) were hybrids parasite having an cox1 S. bovis or S. curassoni associated with an ITS and 18S rDNA of S. haematobium while the cercariae released during diurnal, or mid-day patterns (8:00 AM–6:00 PM) were pure S. haematobium .

Conclusions

Our study showed that domestic activity is the main source of exposure in the Kayes region. Two and three cercariae emission patterns were observed at Diakalel and Fangouné Bamanan respectively. These results suggest that the parasite adapts to the human-water contact period in order to increase its infectivity.

Graphical Abstract

genetic interactions experiments

Schistosomiasis is a widely-recognized parasitic disease of deprived populations in Africa, Asia, and Central and South America, with significant health impact on both human and animal population [ 1 , 2 ]. This disease affects almost 240 million people worldwide, most of which are found in sub-Saharan Africa [ 3 ]. Furthermore, West-African countries, such as Ghana, Mali, Burkina Faso, Côte d’Ivoire, Niger, Senegal and Nigeria are considered to be highly endemic with schistosomiasis [ 4 , 5 , 6 , 7 , 8 , 9 , 10 ]. The major schistosome species include Schistosoma haematobium (Sh) and S. mansoni (Sm) [ 11 ]. Sh accounted for over 85% of cases of urogenital schistosomiasis, while Sm , the causal agent of intestinal schistosomiasis, was less prevalent [ 12 , 13 ].

In Mali, two Schistosoma species ( Sh and Sm ) have been reported since the first cases were recorded in the 1940s [ 14 ] . Currently, both forms of the disease occur with geographical variations both in prevalence and intensity [ 15 ]. Except for the irrigated rice-growing areas in the “ Office du Niger” where the two species are co-endemic, Sh appears to be the most common species, especially in the Dogon plateau and the Senegal River Basin [ 16 , 17 , 18 ]. Schistosomes ( S. bovis, S. curassoni or S. mattheei and rhodaini ) also infect domestic and wild animals like livestock, rodents, etc. The cohabitation of humans with animals in the same environment can lead to hybridization of the schistosome species they host, a phenomenon quite common in many schistosomiasis endemic areas of sub-Saharan Africa [ 19 , 20 , 21 , 22 ]. The primary snail species implicated in the transmission of human schistosomiasis include  Bulinus species, B. truncatus  and B. globosus for Sh , as well as Biomphalaria pfeifferi for Sm [ 23 , 24 , 25 ] .

Transmission of schistosomiasis is significantly influenced by people's behaviors in terms of contacts (swimming, fishing, bathing, washing, and laundry) with water in schistosomiasis endemic areas. In areas where schistosomiasis is endemic, humans have always developed interactions with surface water systems on which they depend, resulting contamination by excreta of water sources and exposure of humans to infectious diseases such as schistosomiasis. Equally important are the presence and distribution of infected intermediate snail hosts within watercourses, which play a crucial role in the disease's transmission dynamics [ 26 ]. The schistosome parasite has a complex life cycle that involves two hosts: a freshwater snail, which acts as the intermediate host in which the parasite undergoes larval development, and the definitive hosts (humans or animals) in which the parasite matures into an adult [ 27 , 28 ]. In order to ensure the survival of the species and the successful transmission of parasites, many trematode species, such as schistosomes, have synchronized their daily emergence rhythms with vertebrate host visits/activity to the biotope [ 29 , 30 , 31 , 32 ]. The satisfaction of a vital need of animals (watering early between 6:00 to 10:00 AM in the morning before going to pasture or late between 6:00 and 8:00 PM after their return) promotes S. bovis (Sb) infestation. In contrast, people's attraction to water during the hot hours between 10:00 AM to 6:00 PM (swimming and recreational activities) induces Sh infestation by the opening of the host/parasite encounter filter. So, as snails are intermediate hosts that release the cercarial larvae of schistosomes that infest humans, their examination provides important information on active transmission foci. Because of the possibility of hybridization between human and animal schistosomes, we thought it useful to identify the genetic profiles of cercariae that are released by snails hosts naturally infected at different times of the day. Moreover, as highlighted previously, there has been wide advocacy to integrate water-sanitation and hygiene (WASH), health education, environmental actions and snail control into the mass drug administration (MDA) control strategy [ 33 ]. For almost two decades, Mali’s Schistosomiasis National Control Program has consistently adopted the preventive chemotherapy strategies as recommended by the World Health Organization (WHO) [ 34 , 35 ]. Regrettably, the MDA alone does not offer an effective protection against initial infection or subsequent re-infection in environments contaminated with the disease. The continued transmission of schistosomiasis in regions with a high disease burden, such as Office du Niger, Plateau Dogon, and the Senegal River Basin [ 7 , 36 , 37 ], underscores the necessity for supplementary control strategies that delve into the patterns of human-water contact and the factors that sustain the schistosome lifecycle and its transmission. In redefining WHO's priorities to sustain results, snail control has recently been re-prioritized as a schistosomiasis control strategy to complement MDA [ 3 ]. However, a major knowledge gap remains, especially regarding how snail biology and ecology affect schistosomiasis transmission and control outcomes. Despite the interventions, including the MDA, schistosomiasis remains a serious threat to populations, especially in the Office du Niger and Senegal River basin which are recognized as development hubs in the country. Even if the hybrid strains of schistosomes recently described in Mali [ 19 , 38 , 39 , 40 ] could be involved, schistosomiasis infection, human water contact and snails’ biology are thus essentially linked, and more knowledge about their relationship will help us to develop appropriate control measures. So far, few studies have related water contact patterns to infection levels in Mali. Furthermore. The aim of this study was to explore and examine the influence of human-water interactions, including snail biology, on urinary schistosomiasis transmission in the Senegal River basin in Mali.

Study sites

We carried out this study in two communities in the Kayes region of western Mali (geographical coordinates between 11°26′40″ W and 14°26′48″ N) known for their endemicity for Sh [ 19 ]. The two communities surveyed, Fangouné Bamanan (Diéma district) and Diakalel (Kaye district), are 300 km apart (Fig.  1 ). They were chosen based on their proximity to water sources (ponds in the Diéma district, the Senegal River and its tributaries in the vicinity of the city of Kayes). The Kayes region is characterized by a northern Sudanese climate in the south and a Sahelian climate in the north with two main seasons: the rainy season (June to October) marked by average annual rainfall of up to 1000 mm in the south and 600 to 800 mm in the north, and the dry season which extends from November to April–May [ 41 ]. The dry season is divided into hot dry season (March to May) and cold dry season (June to October). The water points created fed by rainwater (ponds and the river tributaries are excellent snail breeding sites). Agriculture and livestock are the two main economic activities of the population [ 41 ]. The Sudano-Sahelian climate of the region is indeed favorable to the cultivation and especially to extensive livestock farming where numerous herds of cattle, sheep and goats cohabit. The practice of these two activities around the same water points creates favorable conditions for the mixing of genes between animal and human schistosomes.

figure 1

Localization of the two study sites (Diakalel and Fangouné Bamanan) on the map of the Kayes region (Mali, West Africa) in September 2021 and September 2022

Type of study and parasitological examination of urine

We conducted a cross-sectional and observational study including a parasitological survey in schoolchildren in September 2021. We calculated the minimum sample size on the basis of the previous prevalence (36%) of the disease obtained in each school using the Schwartz formula, taking into account a 10% refusal rate and sampling errors [ 42 ]. We selected students in the schools on the basis of simple random sampling from the class list. The names contained in an envelope were drawn at random until the required size was reached. Urine samples were collected from 393 children aged 6–14 years old using sterile containers between 9:00 AM and 2:00 PM. Each child was assigned an identification number based on the first two letters of the village name. Once the urine was homogenized in the jar, a 10 ml was taken with a syringe and filtered through a numbered Whatman filter paper (diameter 25 mm) previously placed in a filter holder. The filtrate was then stained with 3% ninhydrin, dried and rewetted with tap water and then viewed under a compound microscope with either × 4 or × 10 objective for Sh eggs. The WHO standard was employed in determining the prevalence and intensity (Low: 1–49 egg/10 ml of urine; High: ≥ 50 egg/10 ml of urine) of schistosomiasis respectively. Ten percent (10%) of the filtrate were re-examined by a senior parasitologist for quality control. All schoolchildren infected by schistosomiasis were verbally questioned on the water sites frequented in order to search for the snail vectors and evaluate the parameters (cercariae release time from infected snails, human-water contact time, etc.) favouring infection of the population.

Human water contact interactions and exposure risk

A human water contact survey coupled to malacological prospections conducted during the cross-sectional study in September 2022. The information gathered from infected schoolchildren was used to select specific areas for malacological surveys and analysis of human-water contact times. During the observation of contacts with water, structured questionnaires were administered in the local “Bambanakan” language to participants by a socio-anthropologist. Information on their water contact habits, access to drinking water, sanitation and hygiene facilities, self-reported experiences of schistosomiasis, perceptions of exposure and risk factors, and the presence of blood in urine were recorded. We opted for an exhaustive study, including all people who encountered water during our study period and who gave their consent. The study included all the villagers regardless of ages and sexes living in the study areas.

During the study of human-water contact, we carried out on sites observations of activities promoting people's contact with water at the main contact points. The duration of contact with water was taken by the interviewer using a stopwatch. Concerning the study of the human-water contact, we carried out on-site observations of activities promoting people's contact with water at the main contact points. The observations were made during one week in September 2022 (rainy season). Six human water-contact points (HWCP, A to F) in Diakalel and eight (A to H) in Fangouné Bamanan were surveyed over the three months (Fig.  2 ).

figure 2

Map of sampling human-water contact points (HWCP) in the two study sites (Diakalel and Fangouné Bamanan), September 2022. Alphabetical letters (A, B, C, D, E, F, G et H) denote HWCP at each site

The main information collected were age, sex, body part and duration of water contact activities. The activities requiring contact with water that we identified in each community have been classified into three main categories: domestic (washing kitchen utensils, laundry, fetching water), occupational (fishing, crossing water, watering animals) and recreational (bathing, swimming, playing). Water contact activity in snail-infested water (intermediate host) was defined as exposure.

Water contact duration, water-contact frequency, and water contact activities

Having any water contact was defined as a binary measure (i.e., whether the respondent has had contact with one of the freshwater sites we examined during the period of our study). Contact parameters were recorded according to method described previously [ 43 ]. Indeed, duration of water-contact was defined as a measure of how long the individual was in contact with the water during per exposure event in the study; water-contact frequency was defined as a measure indicating the number of times per day, per week the respondent was in contact with any of the freshwater we have studied. And water-contact activities were defined as measures of whether the respondent engaged in a given water-contact activity, such as recreational, domestic or professional.

Geographical distribution of snails, snail sampling and cercarial management

To determine the geographical distribution of snail’s intermediate hosts, all sampled habitats were mapped using hand-held differential geographic global positioning system (GPS) units (Trimble Navigation Ltd, California, USA) with an estimated accuracy of ± 1 m. Data were downloaded with differential correction into a GPS database (GPS pathfinder office 2.8 Trimble Navigation Ltd, California, USA) and analyses performed using ArcView version 9.2 software (Environmental Systems Research Institute, Inc., Redlands, CA).

We conducted collection of snail intermediate hosts in two communities: Fangouné Bamanan and Diakalel, at the same points of human water contact activities (Fig.  2 ). Snail sampling was conducted by two field sample collectors throughout the study using standard snail sieves or occasionally, by hand picking using long pliers on rocks, rags, old mats, cans, etc. Sampling time was about 15 min per HWCP and was performed between 9:00 AM and 12:00 PM during rainy and cold dry season and between 8:00 AM and 11:00 AM in hot dry season. Sampling area per HWCP varied approximately 3 m 2 to 5 m 2 according to the surface to be examined. At each collection time, snails from each site were appropriately labelled and transported in separate perforated plastic in Kayes or in Diema, where they were processed. Snails were identified to species level based on shell morphological characteristics. Other relevant parameters were recorded in the human-water-contacts such as species of plants and animals associated with snails, vegetation cover, food remains, presence of excreta (feces) in the vicinity of human-water contact points.

Cercarial releasing pattern

Collected snails were rinsed and placed individually in 24-well culture petri-dishes containing 1 ml of clear, filtered water from snail collection sites. To test whether the snails are infected, they have been exposed to indirect sunlight to induce cercarial releasing. The snails were therefore exposed for 24 h. The wells of the plates were then examined for the presence of cercariae under a dissecting microscope. Snails that did not shed cercariae on the first exposure were re-exposed on the second day. Bifurcate cercariae were used to indicate that the cercariae were of mammalian origin. The rhythm of cercarial emission from each positive snail was determined over 24 h with a count every two hours, starting from 6:00 AM. For each snail, the study was carried out over seven consecutive days to show the stability of the emission pattern. The technique used was that previously described [ 44 ]. Briefly, each infested snail was placed in a glass container with 150 ml of well water at a temperature of between 24 °C and 25 °C. Every two hours, each snail was transferred to a new container with the same volume of water. The water left in the container containing the cercariae was filtered through a Nytrel polyamide filter (25 μm mesh size). The cercariae retained on the filter were stained with a Lugol's solution, then counted under a binocular magnifying glass due to the red coloration of the cercariae.

Several cercariae released by each infested snail were stored individually on the FTA cards (QIAGEN, Hilden, Germany) and then identify by molecular targeting of the nuclear (ITS 2 and 18S rDNA) and mitochondrial (cox1) regions of DNA (19). For genetic data, genetic profiles were assigned to parasites using the haploid mitochondrial gene cox1 (first two letters) and the diploid nuclear region ITS2/18S (last four letters). These include "pure" Sh ( Sh cox1_ Sh ITS2/18S: ShxShSh ); hybrids ( Sb/Sc _cox1, Sh _ ITS2, Sh _18S: Sb/ScxShSh ); ( Sh _cox1, Sb _ITS2 Sb _18S: ShxSbSb ); ( Sh _cox1, Sc _ITS2 Sc _18S: ShxScSc ).

Data analysis

Parasitological and human-water contact data were recorded on survey forms with identifiers for each child giving the sample or individual in contact with the water. Hourly cercaria emission percentages were calculated by dividing the number of cercaria emitted per hour by the total cercaria emitted daily. Data were recorded in Microsoft Excel version 2016 (Redmond, Washington, USA). Calculations of prevalence, intensity of infection and freshwater snails’ infestation rate were performed using SPSS version 23.0 software (IBM, Chicago, Illinois, USA). Participants ages were grouped into two age categories i.e. 6–10 years old and 11–14 years old. Multivariate statistical analyses were performed to assess the relationship between sociodemographic data and HWCP parameters. For human-water contact, the comparison of percentages between sites according to sex and age was made by calculating the proportion of each sex or age group within the population of each site. The association between the presence of blood in urine and water contact activities was assessed by a logistic regression model. Differences in proportions were tested using the chi-square test or Fisher's exact test), depending on the data. P -values below 0.05 were considered significant.

Parasitological data

A total of 393 urine samples were examined for Sh ova (Table  1 ). The overall prevalence was 69.2% (272/393). The prevalence and intensity of infection were significantly higher in Diakalel compared to Fangouné Bamanan ( P  < 0.0001 ) . Conversely, there was no significant difference in prevalence and intensity with respect to sex and age of the participants (P  > 0.05).

Characteristics of human-water contact population survey

The human-water contact activities involved 97 participants, 58 in Fangouné Bamanan and 39 in Diakalel. In terms of sex, females were more common both in Diakalel and Fangouné Bamanan ( P  = 0.042). No child aged ≤ 5 years old was observed in Diakalel. While participants aged 16 years and older were predominant in Fangouné Bamanan, those aged 6–15 years were also significantly numerous in Diakalel ( P  = 0.003) (Table  2 ).

Water contact patterns and interactions

All human-water contact (HWC) activities varied significantly according to gender, age and duration of exposure (Table  3 ). However, the major water-contact activity in all the communities was domestic (62.9%) led by 84.6% of older females aged 16 and above. Overall, the percentage of domestic activities decreases with age, while that of recreational activities increases with age. Children under the age of 5 only engaged in recreational activities in contact with water. Among those who engaged in recreational activities, children under 10 years old were the most affected. In terms of exposure duration, most study participants, 81.4% (79/97), were in contact with snail-infested freshwater for between 6 and 30 min. During this contact, recreational activities, followed by occupational and domestic activities, were observed respectively (Table  3 ). Recreational activities carried out mainly by children aged 6–10 years were associated with the longest duration (60 min) of contact with infected water. The frequency of domestic activities varied based on their nature. For instance, laundry, the most common activity (82.9%), was typically done once a week, while activities such as crockery were carried out daily. Also, swimming, primarily enjoyed by children, served as a central recreational activity (Table  3 ).

Figure  3 A shows the variation in the duration of exposure of participants to contaminated water at Fangouné Bamanan. An overall exposure duration of 6 to 30 min accounts for up to 74.1% of all the participants. Meanwhile, domestic activities alone exposed up to 22.4% of the participants to the cercarial infested water for durations exceeding 30 min. In contrast to Fangouné Bamanan, more than 80% of the people were exposed to cercaria infested water within 15–30 min for only domestic and recreational activities (Fig.  3 B). While in children aged 6–10 years, the frequency of arm/foot contact with water once a week was comparable to that of daily contact (Fig.  3 C). In contrast, 90.9% (10/11) of their whole-body surface area were exposed to surface water systems every day (Fig.  3 D).

figure 3

Human-water-contact activities (HWCA) frequency (%) associated to duration of exposure of the participants to surface water systems and human-water-contact (HWC) frequency associated to the exposed part of the body at both communities, September 2022. A Human-water-contact activities (HWCA) frequency (%) associated to duration of exposure of the participants to surface water systems at Fangouné Bamanan; B Human-water-contact (HWC) frequency (%) associated to the duration of exposure of participants to surface water systems at Diakalel; C arms/feet; D whole body) in children aged 6–10 years at Fangouné Bamanan and Diakalel

Hematuria and human-water-contact (HWC) activities

No significant association was observed between the prevalence of blood in urine and human water contact activities in this study. However, those engaged in recreational and occupational activities were 0.09 and 0.24 times more likely to develop hematuria than those engaged in domestic activities (Table  4 ).

Snail species, distribution and abundance

A total of 378 freshwater snails were collected at 14 different human/water contact point sites. Of the 378 snails, 126 were collected at HWCP-H (Human water contact point H) on the 8 along the Fangouné Bamanan stream and 252 at 3 HWCP (A, B, C) on the 6 Diakalel sites along the tributaries of the Senegal River (Table  5 and Fig.  2 ). All the collected snails were of the B. truncatus species, identified by their shell morphology. In Diakalel where two types of habitats (the river and its tributaries) were examined, snails were found only in tributaries and were collected on water lily in 7 out of 8 sites in Fangouné Bamanan. On several occasions, they were also associated with food scraps or different supports (rags, old boxes, pieces of wood, etc.) abandoned in the water. Other aquatic fauna encountered includes fry, frogs, leeches and insect larvae.

As expected, numerous snail samples collected during the survey harbored cercariae. Two HWCP (A and G) in Fangouné Bamanan and 2 (A and B) in Diakalel provided infected snails (Table  5 , Fig.  4 ). Overall, 7.1% (27/378) of snails emitted Schistosoma cercariae. The prevalence of schistosome cercariae shedding (PSCS) was 15.0% (19/126) in Fangouné Bamanan and 3.3% (8/252) in Diakalel. Regardless of HWCP in each site, the highest PSCS was recorded at point G, 24.3% (18/74) in Fangouné Bamanan and point A, 6,1% (7/111) in Diakalel (Table  5 ).

figure 4

Map of human Water Contact Point (points in dark) associated to transmission sites (points in red) in the two study communities (Diakalel and Fangouné Bamanan), September 2022. The red dots ( A , B in Kayes and A , G in Fangouné Bamanan) indicate the transmission sites

Cercarial emission patterns

Curves representing the average daily peak in cercarial emissions (circadian rhythm) of B. truncatus snails from Fangouné Bamanan are shown in Fig.  5 which shows variability in cercariae emission.We identified three distinct emission patterns at Fangouné Bamanan, with each curve representing the rhythmic cercariae emission of a group of snails sharing the same pattern:

figure 5

Cercarial emission patterns from Schistosoma haematobium naturally infected: a Early diurnal pattern for G2-FB; b Midday diurnal pattern for G3-FB; c Early to late nocturnal pattern for G1-FB in Fangouné Bamanan, September 2021. Gx-FB corresponds to snail x collected at point G in Fangouné Bamanan

(i) early diurnal emission was observed in 6 out of the 19 snails (31.6%). Cercaria emission commenced at 6:00 AM, coinciding with the onset of the light period, and reached its peak at 8:00 AM for G2-FB; (ii) A midday diurnal pattern was found in 11 out of the 19 snails (57.9%). The average emission peak occurred at 2:00 PM for G3-FB and (iii) a combination of early diurnal (6:00–8:00 am) and nocturnal (6:00–8:00 PM) patterns was observed in 2 snails (10.5%) out of the 19 for G1-FB.

Profiles of the average daily peak of cercarial emissions in B. truncatus snails from Diakalel are shown in Fig.  6 . We identified two different patterns, and each snail hosted only one peak. a) An early diurnal pattern was observed for 4 of the 8 snails (50.0%). Cercaria emission peaked at 8:00 AM for A2-Dia. b) A midday diurnal pattern was found for 4 of the 8 snails (50.0%). The average emission peak at 2:00 PM for A1-Dia.

figure 6

Cercarial emission from Schistosoma haematobium naturally infected: a Early diurnal pattern for A2-Dia; b Midday diurnal pattern for A1-Dia in Diakalel, September 2022. Ax-Dia corresponds to snail x collected at point A in Diakalel

Genotyping of schistosomes for cox1 and ITS/18S rDNA genes

The genetic profiles of cercariae released by snails in September 2021 were determined (Table  6 ). The profiles varied according to the period and sites of release. The cercariae released by snails collected at Diakalel gave two different profiles. Between 6:00 AM and 10:00 AM, all cercariae released were hybrids ( Sb/Sc_ShxSh ), then between 10:00 AM and 4:00 PM, cercariae were pure Sh ( Sh_ShxSh ) profiles. Two snails had emitted both pure and hybrid species. Similarly, at Fangouné Bamanan, the cercariae emitting patterns also showed two different profiles. Here, the pure Sh cercariae were found between 6:00 AM and 6:00 PM, unlike the hybrids ( Sb/Sc_ShxSh ) were emitted between 6:00 PM and 8:00 PM (Table  6 ). One snail had emitted both pure and hybrid species.

This study was conducted in the Senegal River Basin in Mali to examine the interactions between three essential elements of schistosomiasis transmission cycle i.e., the definitive host (human or animals), the freshwater snail (intermediate host) and the surface water systems where the hosts meet during various water contacts activities. Epidemiological data on schistosomiasis prevalence in human and snail intermediate hosts is crucial for identifying transmission sites as well as the parasite’s distribution in a particular area and informing decision-makers and control programs. With reference to the current integrative methods of control and elimination of schistosomiasis, the current WHO roadmap sets goals to eliminate schistosomiasis as a public health problem by 2030 [ 45 ]. Therefore, an in depth understanding of the disease context in sub-Saharan Africa with respect to ongoing transmission in endemic zones is a requirement to achieve these set goals. In this study, 69.2% of the schoolchildren were infected with Sh . The high prevalence observed is consistent with findings reported in the same area where the prevalence of urinary schistosomiasis was 84.4% in preschool-aged children especially in Fangouné Bamanan [ 37 ]. Our findings were consistent with a previously reported prevalence of 72.4% observed at the Office du Niger, Central Mali [ 7 ] but higher than 14.0% that was recorded in Côte d’Ivoire [ 5 ].

The interactions between humans and freshwater sources play a significant role in influencing schistosomiasis transmission, especially when considering the specific type of water-contact activities. The direct dependence on sources of unhealthy water, i.e. surface water, increases the probabilities of exposure to contaminated water thus, the risk of Sh infection. In this study, we found that people of Fangouné Bamanan and Diakalel are highly dependent at a given time on surface water from the streams and Senegal river, respectively for their domestic, professional, or recreational needs. Consequently, these freshwater sources widespread utilization for various purposes including washing, laundry, fetching for domestic cooking, fishing, river crossing, animal watering, bathing, swimming, and recreational activities. Although sanitation facilities generally exist in both communities, their low use was the key factor driving community members, especially children, to engage in open defecation (data do not present here). In this study, a common practice observed was the unsanitary disposal of urine and feces, which resulted in the contamination of water bodies with schistosome eggs. Proximity to these freshwater bodies has also play a significant role in influencing unsanitary practices by the inhabitants of the study communities: for instance, several houses Fangouné Bamanan community are built at the bank of the stream with toilet facilities built at the riverbank. This setup provides a more comfortable and well-ventilated environment for open defecation, which, unfortunately discourages the use of latrines. This is therefore consistent with Schmidlin (2013) [ 46 ] where it was reported that poor hygiene and sanitation linked to the practice of insanitary disposal of urine or feces play crucial roles in the transmission cycle of schistosomes, as eggs are released into waterbodies via excreta). Larvae resulting from eggs hatching infect the snails (intermediate hosts), which in turn release the parasites to infect humans [ 47 , 48 ]. Water contamination by human urine is the initial step in the infection of Bulinus snails by miracidia. Subsequently, this contaminated water serves as the source of infection for individuals who come in contact it, while the risk of contamination is also increased by the duration ranging between six and thirty minutes of exposure and account for 74% of participants. In retrospect, the penetration time for cercariae was estimated to be less than 10 min for Sm [ 49 ], it is noteworthy that the exposure time for 71% of participants exceeded 15 min with some exceeding 60 min, this extended exposure duration is considered sufficient for cercarial penetration.

The type of water-contact activity also plays an invaluable role in the transmission of the schistosome. In this study, domestic activities were predominantly practiced by female (87%), followed by recreational activities practiced mainly by children aged 6–14 years old (67%). In fact, it has been recognized that the frequency and duration of the water-contact are influenced by the type of water-contact activity, which in turn correlates with the level exposure. Whilst, domestic water-contact activities are linked to more frequent (but rather short) water-contact activities, recreational water-contact activities occur less frequently but usually for longer durations. In our study recreational activities such as swimming facilitates exposure for longer durations, the same activity also was positively correlated with longer water-contact durations in the Shinyanga District of Tanzania, thereby increasing the exposure to schistosomiasis [ 50 ]. Similar patterns were also observed in Senegal where exposure of women and children to cercariae was influenced by the frequency and duration of water-contacts [ 51 ]. This indicates that the type of water interaction is an important factor mediating exposure to cercariae and the risk of schistosomiasis. As swimming activities usually involve longer water-contact durations as well as full submersion of the body, it is not surprising that recreational and domestic activities are significantly associated with the presence of blood in urine. However, in contrary to the what was reported in Lower Densu River basin in Ghana, more frequent water-contacts (more than twice per week) and longer water-contact durations (more than 30 min) did not show significant increase in the odds of hematuria [ 6 ]. Similarly, a study in Nigeria also highlighted that direct water contact exposes the individuals to the cercariae and thus places them at risk of infection especially those who directly depended on freshwater as a source of livelihood [ 52 ]. Similarly, fetching water for household use and other similar activities that involve frequent water-contacts, were determinant to expose children with relatively high levels of exposure to cercariae in the Densu Basin in Ghana [ 53 ]. Domestic water-contacts, such as washing clothes or kitchen utensils, and recreational water-contacts such as swimming were recorded as the main exposure factors in our two study sites. These results are supported by recent studies that showed WASH or sex is less influential risk factor for infection than water contact regarding the magnitude of the association between exposure and schistosome infection [ 53 ]. These studies revealed that having any water contact was associated with 3.14 times higher odds of infection compared to no water contact. It is therefore evident that water contact is common and, in many cases, unavoidable.

Examination of 126 snails showed that B. truncatus was the only snail intermediate host of human schistosomiasis encountered. In contrast to these results, the malacological fauna is richer in the rice irrigated area of Office du Niger or at the suburban area of Bamako, including other vector species such as B. globosus for Sh and Biomphalaria pfeifferi for Sm [ 23 , 24 , 25 ]. In our study, all the host snails were found only in the streams in Fangouné Bamanan and in the Senegal River tributaries in Diakalel because of the slow flow in these water surface abundantly covered with aquatic plants ( Pistia stratiotes and Nymphaea micrantha ), grasses or bushes in the riverbed. In the Senegal river however, the intense water current, waves and lack of vegetation cover prevent any snail settlement. The global natural prevalence of shedding schistosome cercariae of Schistosoma spp. was 7.1% (27/378) synonymous with the existence of intense outbreaks of parasite transmission. The prevalence of Schistosoma spp. infection in the snails was higher in Fangouné Bamanan (15.0%) than in Diakalel (3.1%) and those recorded in Bamako (8.3%) [ 18 ] but lower than those observed in Office du Niger (up to 24%) [ 25 ]. In the Niger River Valley (NRV) in Niger, the prevalence of Schistosoma spp. infection was low for B. forskalii with 0.2% (24/11,989), also low in B. truncatus (0.8%, 342/42,500) and relatively high in Biomphalaria pfeifferi (3.4%, 79/2290) [ 54 ]. Despite the large number of snails collected and the high number of sites surveyed in the Niger valley, infection rates remain low compared with our previous results. These results observed in the Niger Valley and elsewhere [ 55 ] show that natural infestation rates are generally low. Careful selection of the water contact point where snails are caught, i.e. where they are most likely to be infested by excreta (urine and feces), has more influence on the snail infestation rate than a high number of samples caught, or sites surveyed. In other words, isolated HWCP that are seldom frequented by the population can provide many samples, almost all of which will be negative. In our study, beyond of the intensity of water contact activities, the village of Fangouné, for example, is located almost on the riverbed, offering young children the opportunity to defecate there, given the difficulties of accessing traditional toilets that are less comfortable for them.

Our results on the chronobiology of Schistosoma spp. cercarial emission in Fangouné Bamanan and Diakalel in the Kayes region showed that the rhythm of emergence was of a circadian type.

The first pattern, early diurnal peaking between 6:00 AM and 8:00 AM, exhibited an hybrid cercarial emission pattern; it was observed in 6 out of 19 B. truncatu s examined (31.6%) (for G2-FB in Fangouné Bamanan; A2-Dia in Diakalel).To buttress this, such a pattern was also observed in Sb from Benin [ 56 ], Sardinia (Italy), Sudan and Spain [ 57 ] and Niger [ 58 ]. The difference between the genetic profiles could be explained by the nature of the molecular tools used to identify the nuclear gene, i.e. ITS in previous studies and ARMS in our study. The second pattern was observed for 47.4% of snails from 9 B. truncatus (for G3-FB in Fangouné Bamanan; A1-Dia in Diakalel; Fig.  5 & 6 ) with cercariae emitting between 10:00 AM and 6:00 PM released during daylight hours. It was similar to what has been published on Sh in humans from Algeria [ 59 ], Morocco [ 29 ], Niger [ 57 ], and Gabon [ 31 ]. The third pattern corresponding to a typical early pattern for Sh accompanied by a diurnal pattern for Sb/Sc_ShxSh emergence peaking around 3:00 PM and 7:00 PM respectively, was found for 2 of 19 snails (10.5%) (for G1-FB in Fangouné Bamanan). Such double peak in cercarial emergence was reported for Sb for the first time in Benin [ 23 ]. For another animal schistosome, S. margrebowiei , two emergence peaks per day were described, with the first peak occurring 1 h after the onset of daylight, and the second peak one hour after the onset of darkness [ 60 ]. On the other hand, many authors support that the cercarial emergence behavior is of genetic origin, [ 61 ]. Thus, even if the schistosome is subjected to different environmental pressures at the level of its intermediate snail host, its behavior remains unchanged. It is the case of the emission profile of Sh from the snail B. truncatus which does not change when the snail is also infected with another species of schistosome, Sb [ 29 ]. Regardless to the behavior genetic supporting, the two-peak cercarial emergence observed in our study could be assigned to the same species Sb as demonstrated in Benin [ 56 ], or to two different species of which the second one remains to be identified at the molecular level. In the latter case, the two-peak cercarial emergence found in a single snail sample suggests that, clearly, two miracidia succeeded in developing in this one snail, leading to the double peak in cercarial emergence. This is the case of our study where the only snail has been infected with miracidia of Sb , Sc and Sh leading to the hybrid Sb/Sc_ShxSh .

From an evolutionary point of view, emergence times are usually well correlated with times when the definitive putative hosts species are present in the water and available for infection [ 62 ]. In Diakalel and Fangouné Bamanan, the circadian rhythm of emissions, with a peak around 3:00 PM, can be explained by human contact with water, which is essentially related to bathing during the hottest hours of the day and for domestic activities such as laundry and washing kitchen utensils at any time of the day. In the particular case of Fangouné Bamanan a rural area, the practice of activities other than bathing, such as artisan fishing, can result in a change in the cercarial emission pattern to a very particular pattern, with a primary peak occurring around 3:00 PM but secondary peaks at dawn and dusk, when fishers (school aged children and young adults) are in contact with water, as has been shown in Sm in Benin [ 63 ].

Regarding the limitations of our study, the results obtained were generated following surveys carried out as part of one round cross-sectional study. Considering potential spatial and temporal variations in malacological and human-water contact, data surveys must be multiplied over at least two or three years. For a study which relies on self-reported cases of blood in urine, some degree of reporting bias must be expected, particularly, the effects of sex and age must be treated with great prudence. Although all children engage in multiple water-contact activities, however, only the predominant ones were reported. This may affect the respective effects of the individual water-contact activities. The study design assumes that all children have some degree of exposure, therefore there was no control group, thus not allowing robust case-control analysis. Hybrid strains were identified, but a relatively high number of at least a hundred cercaria could give better results about the hybrid cercariae observed. We assume that a technical roadmap supporting the coherence of the document could be drawn by organically combine the three parts (human water contact findings, malacological data and Cercarial chronobiology).

This study evaluated human/water contact and its influence on the patterns and genetic profile of cercariae emitted by snails, intermediate hosts of urogenital schistosomiasis. Our results suggest that in Mali, domestic activity, which seems to be carried out only by women, was the main factor predisposing to schistosomiasis infection, followed by recreational activity practised mainly by children. The infection risk in the populations was the presence of infected snails ( Bulinus truncatus and B. globosus ) combined with a chronobiological polymorphism in the cercarial emergence rhythm released from these snails, a consequence of the contamination of the water by human excrement (urine or stool). The cercarial emissions of snails naturally infected are observed at the early and middle of the day (in Diakalèl and Fangouné Bamanan) and also in the first two hours of the night in Fangouné Bamanan. The molecular data from cercariae collected at Fangouné Bamanan showed a unique S. haematobium profile with a chronobiological polymorphism suggesting (i) an adaptation of the parasite to the time of human (or animal) host water contact or (ii) an opening up of the host infection spectrum by the parasite in order to increase their survival, which is a consequence of hybridization between human and animal schistosome species. Further studies on animal reservoir hosts such as domestic livestock and small commensal mammals such as rodents in these sites could provide more complete information on the dynamics of water contact activities that could help to better explain certain chronobiological profiles that we have observed. These data could help to adapt local measures for sustainable control of the disease.

Availability of data and materials

Not applicable.

Abbreviations

Human-water contact

Rapid diagnostic

Amplification refractory mutation system

Polymerase chain reaction

Internal transcribed spacer

Schistosoma haematobium

Schistosoma mansoni

Schistosoma curassoni

Mass drug administration

World health organization

Human water-contact points

Global positioning system

Prevalence of schistosome cercariae shedding

Stadley J, Dobson P, Stothard JR. Out of animals and back again: schistosomiasis as a zoonosis in Africa. Schistosomiasis InTech. 2012. https://doi.org/10.1186/s13071-019-3745-8 .

Article   Google Scholar  

Léger E, Borlase A, Fall C, Diouf N, Diop S, Yasenev L, et al. Prevalence and distribution of schistosomiasis in human, livestock, and snail populations in northern Senegal: a one health epidemiological study of a multi-host system. Lancet Planet Heal. 2020;4(8):e330–42.

WHO. Schistosomiasis (Bilharzia). 2023. https// www.who.int › Heal Top. Accessed 10 Mar 2023.

Cisse M, Sangare I, Djibougou A, Tahita MC, Gnissi S, Bassinga JKW, et al. Prevalence and risk factors of Schistosoma mansoni infection among preschool-aged children from Panamasso village, Burkina Faso. Parasit Vectors. 2021;14(1):185.

Article   PubMed   PubMed Central   Google Scholar  

Angora E, Boissier J, Menan H, Rey O, Tuo K, Touré AO, et al. Prevalence and risk factors for schistosomiasis among schoolchildren in two settings of Côte d’Ivoire. Trop Med Infect Dis. 2019;4(3):110.

Ntajal J, Evers M, Kistemann T, Falkenberg T. Influence of human–surface water interactions on the, transmission of urinary schistosomiasis in the Lower Densu River basin. Ghana Soc Sci Med. 2021;288: 113546.

Article   PubMed   Google Scholar  

Ly B, Yaro AS, Sodio B, Sacko M. Persistance de la schistosomiase urinaire en zones endémiques soumises aux traitements de masse répétés au Mali. Int J Biol Chem Sci. 2019;13(1):369–81.

Pennance T, Allan F, Emery A, Rabone M, Cable J, Garba A, et al. Interactions between Schistosoma haematobium group species and their Bulinus spp. intermediate hosts along the Niger River Valley. Parasit Vectors. 2020;13(1):268.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Amuga GA, Nebe OJ, Nduka FO, Njepuome N, Dakul DA, Isiyaku S, et al. Schistosomiasis: epidemiological factors enhancing transmission in Nigeria. Med Full Length Artic. 2020;8:023–32.

Google Scholar  

Jones I, Sokolow S, Chamberlin A, Lund A, Jouanard N, Bandagny L, et al. Schistosome infection in Senegal is associated with different spatial extents of risk and ecological drivers for Schistosoma haematobium and S mansoni. PLoS Negl Trop Dis. 2021;15(9): e0009712.

Huyse T, Van den Broeck F, Hellemans B, Volckaert FAM, Polman K. Hybridisation between the two major African schistosome species of humans. Int J Parasitol. 2013;43(8):687–9.

Article   CAS   PubMed   Google Scholar  

Hotez P, Kamath A. Neglected tropical diseases in sub-Saharan Africa: Review of their prevalence, distribution, and disease burden. PLoS Negl Trop Dis. 2009;3(8): e412.

Abe E, Guan W, Guo Y, Kassegne K, Qin Z, Xu J, et al. Differentiating snail intermediate hosts of Schistosoma spp. using molecular approaches: fundamental to successful integrated control mechanism in Africa. Infect Dis Poverty. 2018;7(1):29.

Gaud J. Les bilharzioses en Afrique occidentale et en Afrique centrale. Bull WHO. 1955;13(2):209–58.

CAS   PubMed   PubMed Central   Google Scholar  

Brinkmann U, Werler C, Traore M, Korte R. The national Schistosomiasis Control Programme in Mali, objectives, organization, results. Trop Med Parasitol. 1988;39(2):157–61.

CAS   PubMed   Google Scholar  

Brinkmann U, Korte R, Schmidt-Ehry B. The distribution and spread of schistosomiasis in relation to water resources development in Mali. Trop Med Parasit Organ Dtsch GTZ. 1988;39(2):182–5.

CAS   Google Scholar  

Traoré M, Maude G, Bradley D. Schistosomiasis haematobia in Mali: Prevalence rate in school-age children as index of endemicity in the community. Trop Med Int Heal. 1998;3(3):214–21.

Dabo A, Diarra A, Machault V, Touré O, Niambélé DS, Kanté A, et al. Urban schistosomiasis and associated determinant factors among school children in Bamako, Mali, West Africa. Infect Dis Poverty. 2015;29(4):4.

Agniwo P, Boissier J, Sidibé B, Dembélé L, Diakité A, Doumbo SN, et al. Genetic profiles of Schistosoma haematobium parasites from Malian transmission hotspot areas. Parasit Vectors. 2023;16(1):263.

Webster B, Diaw O, Seye M, Webster J, Rollinson D. Introgressive hybridization of Schistosoma haematobium group species in senegal : species barrier break down between ruminant and human Schistosomes. PLoS Negl Trop Dis. 2013;7(4): e2110.

Angora EK, Vangraefschepe A, Allienne J-F, Menan H, Coulibaly JT, Meïté A, et al. Population genetic structure of Schistosoma haematobium and Schistosoma haematobium × Schistosoma bovis hybrids among school-aged children in Côte d’Ivoire. Parasite. 2022;29:23.

Onyekwere A, Rey O, Allienne J, Nwanchor MC, Alo M, Uwa C, et al. Population genetic structure and hybridization of Schistosoma haematobium in Nigeria. Pathogens. 2022;11(4):425.

Madsen H, Coulibaly G, Furu P. Distribution of freshwater snails in the river Niger basin in Mali with special reference to the intermediate hosts of schistosomes. Hydrobiologia. 1987;146:77–88.

Coulibaly G, Madsen H. Seasonal density fluctuations of intermediate hosts of schistosomes in two streams in Bamako, Mali. J African Zool. 1990. https://doi.org/10.5555/19930883392 .

Dabo A, Diop S, Doumbo O. Distribution des mollusques hôtes intermédiaires des schistosomiases humaines à l’office du Niger (Mali). II: Rôle des différents habitats dans la transmission. Bull la Société Pathol Exot. 1994;87(3):164–9.

Chadeka E, Nagi S, Sunahara T, Cheruiyot N, Bahati F, Ozeki Y, et al. Spatial distribution and risk factors of Schistosoma haematobium and hookworm infections among schoolchildren in Kwale, Kenya. PLoS Negl Trop Dis. 2017;11(9): e0005872.

Rollinson D, Southgate VR. The genus Schistosoma : a taxonomic appraisal. Bio Schistosomes Genes Latrines. 1987. https://doi.org/10.5555/19880851450 .

Boissier J, Mouahid G, Moné H. Schistosoma spp. In: Rose JB, Jiménez-Cisneros B, editors. Global water pathogen project. United States: Michigan State University; 2019.

Mouahid A, Chaib A, Animate LDB, De Villeneuve A, Sraghna K. Cercarial shedding patterns of Schistosoma bovis and S . haematobium from single and mixed infections of Bulinus truncatus. J Helminthol. 1991;65(1):8–14.

Ibikounle M, Mone H, Abou Y, Kinde-gazard D, Sakiti NG, Mouahid G, et al. Premier cas de chronobiologie des émissions cercariennes de type infradien chez Schistosoma mansoni dans deux foyers du sud-Bénin. Int J Bio Chem Sci. 2012;6(3):1081–9.

Mintsa-ngue R, Ibikounle M, Mengue K. Cercarial emergence pattern of Schistosoma haematobium from Libreville. Gabon Parasite. 2014;21:3.

Théron A. Chronobiology of trematode cercarial emergence: from data recovery to epidemiological, ecological and evolutionary implications. Adv Parasitol. 2015;88:123–64.

Tchuem Tchuenté LA, Rollinson D, Stothard JR, Molyneux D. Moving from control to elimination of schistosomiasis in sub-Saharan Africa: time to change and adapt strategies. Infect Dis Poverty. 2017;6(1):42.

WHO. Prevention and control of schistosomiasis and soil-transmitted helminthiasis. WHO Tech Rep Series. 2002;912:1–57.

WHO. Preventive chemotherapy in human helminthiasis: coordinated use of anthelminthic drugs in control interventions. Geneva: WHO; 2006.

Clements ACA, Bosqué-Oliva E, Sacko M, Landouré A, Dembélé R. A comparative study of the spatial distribution of schistosomiasis in Mali in 1984–1989 and 2004–2006. PLoS Negl Trop Dis. 2009;3(5): e431.

Dabo A, Mahamat Badawi H, Bary B, Doumbo OK. Urinary schistosomiasis among preschool-aged children in Sahelian rural communities in Mali. Parasit Vectors. 2011;4:21.

Soentjens P, Cnops L, Huyse T, Yansouni C, De VD, Bottieau E, Esbroeck CJ, et al. Diagnosis and Clinical Management of Schistosoma haematobium – Schistosoma bovis hybrid infection in a cluster of travelers returning from Mali. Clin Infect Dis. 2016;63(12):1626–9.

King CH, Bertsch D. Historical perspective: snail control to prevent schistosomiasis. PLoS Negl Trop Dis. 2015;9(4): e0003657.

Taylor MG. Hybridisation experiments on five species of African Schistosomes. J Helminthol. 1970;44(3):253–314.

Philibert A, Tourigny C, Coulibaly A, Fournier P. Birth seasonality as a response to a changing rural environment (kayes region, Mali). J Biosoc Sci. 2013;45(4):547–65.

Ministry of health and public hygiene. 2017 Master plan for the control of neglected tropical diseases (NTD). National Health Directorate 2017–2021.79.

Reitzug F, Ledien J, Chami G. Associations of water contact frequency, duration, and activities with schistosome infection risk: a systematic review and meta-analysis. PLoS Negl Trop Dis. 2023;17(6): e0011377.

Mouahid A, Théron A. Schistosoma bovis : patterns of cercarial emergence from snails of the genera Bulinus and Planorbarius . Exp Parasitol. 1986;62(3):389–93.

WHO. Schistosomiasis and soil-transmitted helminthiases: progress report. Geneva: WHO; 2022. p. 667–76.

Schmidlin T, Hürlimann E, Silue K, Yapi R, Houngbedji C, Kouadio B, et al. Effects of hygiene and defecation behavior on helminths and intestinal Protozoa infections in Taabo, Côte d’Ivoire. PLoS ONE. 2013;8(6): e65722.

Kulinkina A, Kosinski K, Adjei M, Osabutey D, Gyamfi B, Biritwum N, et al. Contextualizing Schistosoma haematobium transmission in Ghana : assessment of diagnostic techniques and individual and community water-related risk factors. Acta Trop. 2019;194:195–203.

Martel R, Osei B, Kulinkina A, Naumova E, Abdulai A, Tybor D, et al. Assessment of urogenital schistosomiasis knowledge among primary and junior high school students in the Eastern Region of Ghana: a crosssectional study. PLoS ONE. 2019;14(6): e0218080.

Haas W, Haeberlein S. Pénétration des cercaires dans la peau humaine vivante: Schistosoma mansoni vs Trichobilharzia szidati . Parasitol Res. 2009;105(4):1061–6.

Angelo T, Buza J, Kinung Hi S, Kariuki H, Mwanga J, Munisi D, et al. Geographical and behavioral risks associated with Schistosoma haematobium infection in an area of complex transmission. Parasit Vectors. 2018;11(1):481.

Ciddio M, Mari L, Sokolow S, De Leo G, Casagrandi R, Gatto M. The spatial spread of schistosomiasis: a multidimensional network model applied to Saint-Louis region. Senegal Adv Water Resour. 2017;108:406–15.

Ajakaye O, Adedeji O, Ajayi PO. Modeling the risk of transmission of schistosomiasis in akure North local government area of ondo state, Nigeria using satellite-derived environmental data. PLoS Neglected Trop Dis. 2017;11(7): e0005733.

Codjoe SN, Larbi R. Climate change/variability and schistosomiasis transmission in Ga district. Ghana Clim Dev. 2016;8(1):58–71.

Rabone M, Wiethase JH, Allan F, Gouvras AN, Pennance T, Hamidou AA, et al. Freshwater snails of biomedical importance in the Niger River Valley: evidence of temporal and spatial patterns in abundance, distribution and infection with Schistosoma spp. Parasit Vectors. 2019;12(1):498.

Gouvras AN, Allan F, Kinung’Hi S, Rabone M, Emery A, Angelo T, et al. Longitudinal survey on the distribution of Biomphalaria sudanica and B . choanomophala in Mwanza region, on the shores of Lake Victoria, Tanzania: Implications for schistosomiasis transmission and control. Parasit Vectors. 2017;10(1):316.

Savassi BAES, Dobigny G, Etougbétché JR, Avocegan TT, Quinsou FT, Gauthier P, et al. Mastomys natalensis (Smith, 1834) as a natural host for Schistosoma haematobium (Bilharz, 1852) Weinland, 1858 x Schistosoma bovis Sonsino, 1876 introgressive hybrids. Parasitol Res. 2021;120(5):1755–70.

Pages JR, Théron A. Analysis and comparison of cercarial emergence rhythms of Schistosoma haematobium , S . intercalatum , S . bovis , and their hybrid progeny. Int J Parasitol. 1990;20(2):193–7.

Mouchet F, Theron A, Brémond P, Sellin E, Sellin B. Pattern of cercarial emergence of Schistosoma curassoni from Niger and comparison with three sympatric species of schistosomes. J Parasitol. 1992;78(1):61–3.

Kechemir N, De Theron A. Intraspecific variation in Schistosoma haematobium from Algeria. J Helminthol. 1997;71(1):29–34.

Southgate VR, Knowles RJ. On Schistosoma margrebowiei Le Roux, 1933: the morphology of the egg, miracidium and cercaria, the compatibility with species of Bulinus , and development in Mesocricetus auratus . Zeitschrift für Parasitenkd. 1977;54(3):233–50.

Article   CAS   Google Scholar  

Théron A, Combes C. Genetic analysis of cercarial emergence rhythms of Schistosoma mansoni . Behav Genet. 1988;18(2):201–9.

Combes C, Fournier A, Moné H, Théron A. Behaviours in trematode cercariae that enhance parasite transmission: patterns and processes. Parasitolog. 1994;109(Suppl):S3-13.

Ibikounlé M, Mouahid G, Nguéma RM, Sakiti NG, Kindé-gasard D. Experimental parasitology life-history traits indicate local adaptation of the schistosome parasite, Schistosoma mansoni , to its snail host, Biomphalaria pfeifferi . Exp Parasitol. 2012;132(4):501–7.

Download references

Acknowledgements

We would like to thank Mr. Mamadou TRAORE, chief of Fangouné Bamanan village and all the breeders and Mr. Mamadou SIDIBE, Director of Diakalel school and all the schoolchildren, school and health authorities and the population of Kayes for their appreciable contribution to the success of the study. The authors would like to thank mainly Amadou Dabo called Boua for his relentless efforts in transporting of the team and the equipment; the staff of the MRTC (Maria Research and Training Center); the staff of the faculty of Medicine and Dentistry and the Faculty of Pharmacy. We thank the ARES Trading S.A., an affiliate of Merck KGaA, Darmstadt, Germany for the availability of funding and the IHPE laboratory, Univ. Montpellier, CNRS, Ifremer, Univ. Perpignan Via Domitia, Perpignan, France for the welcome.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and affiliations.

Department of Epidemiology of Infectious Diseases, Faculty of Pharmacy, University of Sciences, Techniques and Technologies of Bamako, IRL 3189, Bamako, Mali

Bakary Sidibé, Privat Agniwo, Assitan Diakité, Safiatou Niaré Doumbo, Ahristode Akplogan, Hassim Guindo, Laurent Dembélé, Abdoulaye Djimde & Abdoulaye Dabo

Centre de Recherche Pour La Lutte Contre Les Maladies Infectieuses Tropicales (CReMIT/TIDRC), Université d’Abomey-Calavi, Abomey-Calavi, Bénin

Privat Agniwo, Boris Agossou Eyaton-olodji Sègnito Savassi & Moudachirou Ibikounlé

IHPE, Univ. Montpellier, CNRS, Ifremer, Univ. Perpignan Via Domitia, Perpignan, France

Boris Agossou Eyaton-olodji Sègnito Savassi & Jérôme Boissier

You can also search for this author in PubMed   Google Scholar

Contributions

DA, DL, AP, SDN conceived the study. AP, SB, DA, SDN GH, AAB, DA, prepared the Material, collected, and analyzed data in the field. DA, DL have obtained financing. SDN, DA, AP, IM, SBAES designed the methodology. DA, AP, DL, SDN, IM, SBAES, BJ validated results. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Bakary Sidibé .

Ethics declarations

Ethics approval and content to participate.

The study protocol was approved by the Institutional Ethics Committee of the Faculty of Medicine and Odontology of Bamako under reference number 2018/71/CE/FMPOS. Prior to conducting the study, authorities (school staff and community leaders) were consulted to gain full access to communities and grassroots schools. School authorities, teachers, parents/guardians and children were informed of the objectives, procedures and potential risks and benefits of the study. Verbal informed consent/assent was obtained from the children's parents or legal guardians and from the children respectively. Participation in the study was made voluntary thus children could refuse to participate without any consequences. After sampling, children with Schistosoma infection were treated with praziquantel in accordance with the WHO guidelines (40 mg/kg). For data protection purposes, an identification number was assigned to each participant.

Consent for publication

Competing interest.

The authors declare that they have no competing of interest. Ethical approval Ethical permission was obtained from the Ethic Committee of the” Faculté de Médecin et d’Odontostomatologie FMOS de Bamako, Mali”.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Sidibé, B., Agniwo, P., Diakité, A. et al. Human-water interactions associated to cercarial emergence pattern and their influences on urinary schistosomiasis transmission in two endemic areas in Mali. Infect Dis Poverty 13 , 62 (2024). https://doi.org/10.1186/s40249-024-01229-w

Download citation

Received : 16 April 2024

Accepted : 02 August 2024

Published : 29 August 2024

DOI : https://doi.org/10.1186/s40249-024-01229-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Schistosomiasis
  • Chronobiology
  • Cercarial emission
  • Water contact

Infectious Diseases of Poverty

ISSN: 2049-9957

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

genetic interactions experiments

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 30 August 2024

Prognostic value of KLFs family genes in renal clear cell carcinoma

  • MengRu Fu 1 , 2   na1 ,
  • YuanZhuo Du 2 , 3   na1 ,
  • Fei Liu 1 , 2 ,
  • Jun Xiao 1 ,
  • Li Zhang 1 ,
  • Yan Zeng 1 ,
  • YuJuan Yang 1 &
  • Yan Yan 1 , 2  

Scientific Reports volume  14 , Article number:  20204 ( 2024 ) Cite this article

Metrics details

  • Cancer epigenetics
  • Cancer genetics
  • Cancer genomics
  • Renal cancer
  • Urological cancer

Numerous studies have shown that the Krüppel-like factors (KLFs) family of transcription factors regulate various eukaryotic physiological processes including the proliferation, differentiation, senescence, death, and carcinogenesis of animal cells. In addition, they are involved in the regulation of key biological processes such as cell cycle, DNA repair, and immune response. Current studies focus on investigating the role of KLFs in normal physiological conditions and the incidence and development of diseases. However002C the significance of KLFs family genes in clear cell renal cell carcinoma (ccRCC) remains partly understood; therefore, an in-depth investigation of their role and clinical value in this cancer is desired. The study aimed to investigate the role of KLF family genes in the incidence, development, and prognosis of ccRCC, and to identify the related potential biomarkers and therapeutic targets. The expression of KLFs in the RNA sequencing data of 613 ccRCCs from the TCGA database was analyzed using R software, and UALCAN and GEPIA assessed the expression of KLF genes in ccRCC. Real-time fluorescence quantitative PCR analysis was performed using 10 pairs of paired ccRCC sample tissues and renal cancer cell lines from the First Affiliated Hospital of Nanchang University. Overall survival (OS), progression-free interval (PFI), and disease-specific survival (DSS) of Kidney Clear Cell Carcinoma (KIRC) samples at differential expressions of KLFs in the TCGA database were analyzed using the R software, followed by generating a nomogram prediction model. GSCALite assessed the interactions of KLF genes with miRNAs and generated network maps. Protein interaction network maps of 50 neighboring genes associated with KLF mutations were analyzed using STRING with GO and KEGG functional enrichment analyses. The cBioPortal determined the probability of KLF gene mutations and their impact on OS and disease-free survival (DFS) in patients with ccRCC. Immune cell infiltration of KLFs was analyzed using TIMER. Finally, GSCALite was used to analyze the drug sensitivity and associated pathways of action of KLFs. Correlation validation using cellular experiments. KLF3/5/9/15 were significantly downregulated in ccRCC tissues, whereas KLF16/17 were upregulated compared with the adjacent tissues. Patients with high mRNA levels of KLF16/17 showed significantly lower OS, PFI, and DSS, whereas KLF3/5/9 showed a reverse trend. In patients with ccRCC, a significant correlation was observed between KLF mutations and OS and DSS. Furthermore, the correlation of KLF3/5/9 with immune cell infiltration was stronger than that of KLF15/16, while KLF17 was significantly associated with the Epithelial-Mesenchymal Transition (EMT) pathway. Overexpression of KLF5 inhibits the proliferative and migratory capacity of renal cancer cells (786-O and OS-RC-2), as well as their sensitivity to relevant small molecule drugs. Our research revealed the expression levels and biological significance of KLF genes in ccRCC, particularly highlighting the potential of KLF5 as a promising biomarker and therapeutic target for effective prognosis and diagnosis of ccRCC.

Similar content being viewed by others

genetic interactions experiments

A Zic2/Runx2/NOLC1 signaling axis mediates tumor growth and metastasis in clear cell renal cell carcinoma

genetic interactions experiments

Identification of a differentiation-related prognostic nomogram based on single-cell RNA sequencing in clear cell renal cell carcinoma

genetic interactions experiments

Bioinformatics analysis and experimental verification of the cancer-promoting effect of DHODH in clear cell renal cell carcinoma

Introduction.

The underlying mechanisms for the development of clear cell renal cell carcinoma (ccRCC), particularly the incidence and progression, remain poorly understood. ccRCC is one of the prevalent causes of cancer-related deaths globally 1 . According to global cancer research, a total of 371,700 cases of kidney cancer were reported worldwide in the year 2019, with an age-standardized incidence rate (ASIR) of 4.6 per 100,000 2 . Moreover, the global incidence of kidney cancer increased by 154.78% compared with that in the year 1990. During these three decades, the age-standardized mortality rate (ASMR) was persistently high 3 , with ccRCC being the most prevalent form 4 . Although a majority of the patients with ccRCC undergo surgery, local spread or distant metastasis occurs in approximately 30%, thereby reducing the overall 5-year survival rate to approximately 58% 5 . Early diagnosis and prompt treatment are effective in treating ccRCC. Therefore, the identification of more precise biomarkers and therapeutic targets for ccRCC is urgently required.

Recently, the role of Krüppel-like factors (KLFs) family genes in ccRCC has been widely investigated. KLFs are a class of zinc finger-containing transcription factors that are crucial in regulating various biological processes, including cell proliferation, differentiation, and survival 6 . KLFs promote or inhibit the expression of target genes by binding to specific DNA sequences. Particularly in ccRCC, the expression of several KLF family genes is associated with the occurrence, development, invasion, and prognosis of the tumor 7 . For example, KLF6, an important member of the KLF family, has a potential oncogenic role in tumorigenesis and progression. In patients with ccRCC, a downregulated expression of KLF6 was associated with poor prognosis 8 . Similarly, KLF4 is crucial in the normal functioning of renal cells and is associated with the progression of ccRCC; therefore, it may serve as a potential therapeutic target. Nevertheless, in vitro and in vivo studies have reported that overexpression of KLF4 inhibits the proliferation of renal cancer cells and induces apoptosis, highlighting its potential application in anti-tumor therapy 9 .

Recent research has also revealed the role of KLFs in regulating the immune response and inflammatory reactions, particularly those crucial in the microenvironmental regulation of ccRCC. Reportedly, KLF2 and KLF4 influence the tumor microenvironment and associated immune cells 10 . In addition, the role of KLFs in metabolic alterations of renal cancer cells has presented potential signs for developing novel metabolic intervention strategies 11 . Owing to their multifaceted roles in the pathophysiology of renal cell carcinoma, KLFs could serve as valuable targets in future therapeutic strategies.

Materials and methods

Patients and tumor samples.

We analyzed the mRNA expression of KLFs using R software (“stats” and “ggplot2” packages) on 613 RNAseq results obtained from the public TCGA database ( https://portal.gdc.cancer.gov ), including 541 ccRCC samplesand 72 adjacent tissue samples followed by data visualization. Simultaneously, 10 pairs of ccRCC and adjacent tissue samples were collected at the Department of Urology, the First Affiliated Hospital of Nanchang University between February 2021 and December 2022. Ethics approval was obtained from the Ethics Committee of the First Affiliated Hospital of Nanchang University, reference number: (2022) CDYFYYLK (10–011). All methods were performed in accordance with the relevant guidelines and regulations, and written informed consent was obtained from all subjects. These tissue specimens were confirmed as ccRCC by 2–3 pathologists and stored in liquid nitrogen for further use. Total RNA extraction from the tissues utilized the “Trizol” method, followed by reverse transcription using the EasyScript ® All-in-One First-Strand cDNA Synthesis SuperMix for qPCR kit. Real-time fluorescence quantitative PCR (qRT-PCR) utilized the PerfectStart ® Green qPCR SuperMix (+ Dye I) kit. The mRNA expression of KLF genes in ccRCC and adjacent tissues was performed thrice for validation.

Cell lines and cell culture

The six cell lines used in the present study (HK-2, A-498, ACHN, OS-RC-2, 786-O, and Caki-1) were obtained from ATCC. Following the RNA extraction and reverse transcription, qRT-PCR was performed, as mentioned above. Subsequently, the mRNA expression levels of KLFs were validated in both HK-2 (human renal cortical proximal tubular epithelial cells) and human renal cancer cell lines.

RNA and real-time fluorescent quantitative PCR

The total RNA from tissues and cell lines was dissolved in RNase-free water, and the concentration was adjusted to approximately 500 ng/μl. The primer sequences utilized for KLFs (Table 1 ) were provided by ShangHai Sangon Biotech, with β-actin as an internal reference. The average amplification CT values and corresponding dissolution curves of three replicates were obtained to calculate the expression of respective members of KLFs.

Introduction and application of relevant online databases

The GEPIA database ( http://gepia.cancer-pku.cn/ ) was employed to analyze the RNA sequencing data from TCGA and GTEx databases. The database compiles data from 9,736 tumor tissue and 8,587 normal tissue samples, and the project is technically supported by the Advanced Innovation Center for Genomics at Peking University. We utilized this database for the comparative analyses of the correlation of KLFs at various stages of ccRCC.

UALCAN ( http://ualcan.path.uab.edu/index.html ) includes post-methylation and differential expression data of KLFs in ccRCC and adjacent tissues. This database analyzes data from TCGA, MET500, CPTAC, and CBTTC databases, and provides the expression profiles of protein-coding, lincRNA-coding, and miRNA-coding genes with patient survival information. In addition, it presents data analyses from the Clinical Proteomic Tumor Analysis Consortium, including total and phosphorylated protein data. We utilized this database for the analysis of KLFs' methylation levels, status of lymph node metastasis, and expression in ccRCC subtypes and grades.

cBioPortal ( https://www.cbioportal.org/ ) offers online visualization tools for the analysis of cancer-related genetic data and to identify molecular data based on cancer histological and cytological studies. It compiles data from the TCGA, GDAC, UCSC, and others, to provide information on network connections and interactions between cancer mutations. We employed this database to analyze the probability, patterns, and potential sites of mutations in KLFs. In addition, we examined the effect of such mutations on the overall and disease-specific survival (DSS) before and after the mutation. Subsequently, 50 genes were identified as most relevant to the KLF mutations.

GSCALite is an online analytical tool developed by Professor Guo Anyuan's team at Huazhong University of Science and Technology, which provides data on differential expression, methylation, and survival analysis. We used this tool to identify miRNA molecules associated with KLFs and to create a network diagram. Additionally, we integrated drug sensitivity and gene expression profile data from cancer cell lines in the CTRP database to study the drug sensitivity of KLFs. Spearman’s coefficient was used to correlate the expression of each gene in the gene set with small molecule/drug sensitivity (IC50). KLF17 could not be analyzed due to its unavailability in the CTRP database. Finally, pathway activity modules were used to analyze KLFs-related pathways and generate a visual representation of the global percentage.

Designated as a core data resource by ELIXIR and the Global Alliance for Genomics and Health, STRING ( https://string-db.org/ ) comprises a comprehensive collection of known and predicted protein/gene interactions, and is capable of generating interaction network diagrams for known genes or proteins. We utilized this database to analyze the 50 genes associated with KLF mutations and to generate an interaction network diagram.

Tumor Immune Single-cell Hub (TISCH, http://tisch1.comp-genomics.org ) is a scRNA-seq database designed to characterize the tumor microenvironment (TME) at single-cell resolution. TISCH provides detailed cell type annotations at the single-cell level, thus enabling the exploration of TME in different cancer types. TISCH provides detailed cell type annotations at the single-cell level, enabling the exploration of the TME in different cancer types, and its data are mainly derived from the GEO database.

TIMER ( https://cistrome.shinyapps.io/timer/ ) is designed to identify the immune cell infiltration in tumor tissues. Based on the RNA-Seq expression profile, it provides comprehensive information on the infiltration of B cells, CD8 + T cells, CD4 + T cells, macrophages, neutrophils, and dendritic cells. We utilized this tool to analyze the relationship between KLs expression levels, overall immune infiltration, and levels of the six immune cell types. Additionally, we examined the levels of infiltration in the somatic cells expressing KLFs.

Statistical analysis

Data analysis utilized SPSS 26.1 and R 4.1.3 software. Continuous variables were represented as mean and standard deviation, whereas percentage (%) was used for categorical variables. Further, Wilcoxon signed-rank and rank-sum tests were utilized for comparing the paired and unpaired samples, respectively.

Ethical approval

Our research protocol was approved by the Ethics Committee of the First Affiliated Hospital of Nanchang University. Ethics number: (2022) CDYFYYLK (10–011). Data were retrieved from online databases, and tissue samples were collected from patients who had given permission for their samples to be used in research and admitted to the Department of Urology in hospital.

mRNA expression levels of KLFs in tumor and adjacent tissues

We analyzed the expression of KLFs in TCGA ccRCC and adjacenttissue samples followed by data visualization using R (by “stats” and “ggplot2” packages, respectively; Fig.  1 A and B for non-paired and paired samples, respectively). Notably, the expressions of KLF1/6/7/8/14/16 were significantly higher in all the specimens of tumor compared with the adjacent tissue samples ( p  < 0.001). Conversely, the expressions of KLF5/9/15 were significantly lower in tumor samples relative to the adjacent tissues ( p  < 0.001). Similarly, for non-paired samples, the expressions of KLF3 and KLF17 were significantly downregulated ( p  < 0.001) and upregulated ( p  < 0.05), respectively, in tumors compared with the adjacent tissue samples.

figure 1

Differential expressions of KLFs in tumor and adjacent tissues in TCGA: ( A ) Non-paired samples; ( B ) Paired samples. Differential expression of KLF3/5 in 10 pairs of tumor and adjacent tissues ( C , E ) and cell lines ( D , F ).

Subsequently, the qPCR results of 10 pairs of renal cancer and adjacent tissues revealed significant differences in the expression of KLFs. While KLF3/5/9/15 were downregulated in the renal cancer tissues (Figs.  1 C,E and 2 A,C), the expression of KLF16 was upregulated (Fig.  2 E) and that of KLF17 remained unaltered (Fig.  2 G). Moreover, the expressions of KLF3/5/9/15 were downregulated in the majority of the renal cancer cell lines (Figs.  1 D,F and 2 B,D), while the expression of KLF16/17 in various cell lines varied non-significantly (Fig.  2 F,H).

figure 2

Expression of KLFs in ccRCC tissue samples and renal cancer cell lines: ( A – G ) ccRCC tissue samples; ( B – H ) Renal cancer cell lines.

Clinical relevance of members of KLFs

Furthermore, we analyzed the expression of KLFs in patients with ccRCC based on TCGA data to assess their prognostic value and to generate the Kaplan–Meier survival curves (Fig.  3 A–L). Lower mRNA levels of KLF3 (HR = 0.66, 95% CI 0.48–0.89 and p  = 0.006), KLF5 (HR = 0.63, 95% CI 0.46–0.85 and p  = 0.003), KLF9 (HR = 0.47, 95% CI 0.34–0.64 and p  < 0.001), and KLF15 (HR = 0.67, 95% CI 0.49–0.90 and p  = 0.009) were associated with poor overall survival (OS) in patients with ccRCC (Fig.  3 A,D,G,J). Similarly, lower mRNA levels of KLF3 (HR = 0.50, 95% CI 0.33–0.74 and p  < 0.001), KLF5 (HR = 0.65, 95% CI 0.44–0.95 and p  = 0.025), KLF9 (HR = 0.38, 95% CI 0.25–0.57 and p  < 0.001), and KLF15 (HR = 0.57, 95% CI 0.39–0.84 and p  = 0.005) were associated with a shorter DSS (Fig.  3 B,E,H,K), whereas lower mRNA levels of KLF3 (HR = 0.59, 95% CI 0.43–0.81 and p  = 0.001), KLF5 (HR = 0.57, 95% CI 0.41–0.78 and p  < 0.001), and KLF9 (HR = 0.45, 95% CI 0.33–0.63 and p  < 0.001) were significantly correlated with a shorter progression-free interval (PFI; F i g.  3 C,F,d,I). In contrast, upregulated expressions of KLF16 (HR = 1.52, 95% CI 1.12–2.06 and p  = 0.007) and KLF17 (HR = 1.72, 95% CI 1.27–2.33 and p  < 0.001) were associated with a poor OS (Fig.  4 A,D). Similarly, higher mRNA levels of KLF16 (HR = 1.75, 95% CI 1.19–2.59 and p  = 0.005) and KLF17 (HR = 1.82, 95% CI 1.23–2.68 and p  = 0.003) were associated with a poor DSS (Fig.  4 B,E). Moreover, higher mRNA levels of KLF16 (HR = 1.57, 95% CI 1.14–2.15 and p  = 0.006) and KLF17 (HR = 1.39, 95% CI 1.02–1.90 and p  = 0.039) were associated with a shorter PFI (Fig.  4 C,F). Collectively, these results suggested a significant correlation between the expression of KLF3/5/9/15/16/17 genes and the prognosis of ccRCC, making them potential biomarkers for predicting the survival probability of patients with ccRCC.

figure 3

Correlation between expression levels of KLF3/5/9/15 and the prognosis of ccRCC patients: ( A , D , G , J ) Overall Survival (OS); ( B , E , H , K ) Disease-Specific Survival (DSS); ( C , F , I , L ) Progression-Free Interval (PFI).

figure 4

Correlation between expression levels of KLF16/17 and the prognosis of ccRCC patients: ( A , D ) Overall Survival (OS); ( B , E ) Disease-Specific Survival (DSS); ( C , F ) Progression-Free Interval (PFI). ( E – H ) Correlation between expression levels of KLFs and pathological stages of ccRCC.

Next, we analyzed the correlation between the mRNA levels of KLFs and clinical-pathological features (individual pathological staging). As shown in Fig.  4 G–L, expression of KLF16 was not significantly correlated with the pathological stages of ccRCC ( p  > 0.05). However, the expression of KLF3/5/9/15 decreased with the advancing pathological stages of ccRCC ( p  < 0.001). Conversely, the expression of KLF17 increased with the higher pathological stage of ccRCC ( p  < 0.001). These results suggested the potential of KLF3/5/9/15 as tumor-suppressing and KLF17 as tumor-promoting genes in ccRCC.

Subsequently, the methylation levels of KLFs mRNA, lymph node metastasis, and expression in different tumor subtypes and grades were evaluated by using the UALCAN database. As shown in Fig.  5 A–F, the methylation levels of KLF3/5/15 were significantly higher in ccRCC than in adjacent tissues ( p  < 0.001), whereas those of KLF16/17 were significantly lower ( p  < 0.001). In Fig.  5 J–L, compared with the adjacent tissues, expressions of KLF3/5/9/15 were decreased in ccRCC tissues with increased lymph node metastasis. Moreover, in different ccRCC subtypes, KLF3/5/9/15 were significantly downregulated (Fig.  6 A–F), while KLF16/17 were significantly upregulated in ccRCC. In addition, the levels of KLF3/9/15/17 varied significantly in different subtypes ( p  < 0.001). Increased tumor grades were accompanied by decreased levels of KLF3/5/9/15 (Fig.  6 G–L) and increased levels of KLF16. These results further suggested the potential roles of KLF3/5/9/15 and KLF16/17 as tumor-suppressing and tumor-promoting genes, respectively, in ccRCC.

figure 5

Methylation levels of KLF genes in ccRCC ( A – F ). Differential expressions of KLF genes in ccRCC lymph node metastasis:( G – L ).

figure 6

Differential mRNA expression of KLF genes in ccRCC subtypes ( A – F ). Differential mRNA expression of KLF genes in different grades of ccRCC ( G – L ).

Combined analysis of differential expression of the KLF genes and tumor grading predicted the survival time of patients with ccRCC. Results revealed that lower expression of KLF3/5/9/15 and higher expression of KLF16/17 were significantly correlated with a decreased survival period (Fig.  7 A–D, P  < 0.001).

figure 7

Impact of mRNA levels of KLF genes on the survival of patients with ccRCC based on tumor grading.

Development and validation of a prognostic model for column-line diagrams

After selecting independent prognostic factors including age, pathological M stage, and KLF5 through Cox regression analysis, we constructed a forest plot prognostic model (Fig.  8 A). The AUC values for the ROC curves at 1, 3, and 5 years were 0.89 (0.95–0.83), 0.82 (0.88–0.75), and 0.82 (0.88–0.75), respectively (Fig.  8 B). The calibration curves of the prediction model confirmed a better fit of the curves for 1, 3, and 5 years with the diagonal line (Fig.  8 C), indicating a high accuracy of the prediction model. Further validation using an external database (E-MTAB-1980) 12 was satisfactory (Fig.  8 D–E). The proportional hazards assumption test for the constructed model was performed followed by visualization using R (“ggplot2” package). Results suggested that the model met the proportional hazards assumption (Supplementary Fig.  1 A,B).

figure 8

Nomogram line chart prognostic model ( A ). ROC curves (Receiver Operating Characteristic) for the training set ( B ) and calibration curves ( C ). ROC curves for the validation set ( D ) and calibration curves ( E ).

Neighboring genes associated with KLF mutations

GSCALite generated a network diagram showing the interactions between KLF3/5/9/15 and numerous miRNAs (Fig.  9 A). Additionally, the cBioPortal identified 50 adjacent genes associated with KLF mutations and created a corresponding protein interaction network using the STRING tool (Fig.  9 B). Subsequently, R software (“ggplot2” package) was employed to perform KEGG and GO functional enrichment analyses for KLFs and the 50 adjacent genes (Fig.  9 C). Biological processes (BP) included genes for acyl-CoA biosynthesis (GO:0,071,616), thioester biosynthesis (GO:0,035,384), acetyl-CoA metabolism (GO:0,006,084), and acetyl-CoA biosynthesis (GO:0,006,085). In addition, cellular components (CC) included GO:0,007,044 (cell-substrate junction component), GO:0,048,041 (focal adhesion component), GO:0,035,579 (specific granule membrane), and GO:1,905,286 (serine-type peptidase complex). Moreover, KEGG pathways included hsa05202 (transcriptional misregulation in cancer) and hsa00410 (butanoate metabolism). Notably, these processes are associated with KLF mutations in ccRCC.

figure 9

Network diagram of interactions between KLFs and miRNAs ( A ). Protein–protein interaction (PPI) network of 50 adjacent genes associated with KLF mutations ( B ); Functional analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways ( C ).

Single-cell analysis of the KLFs family in ccRCC

The selected single-cell analysis dataset (KIRC_GSE121636) contained a significant number of CD4Tconv cells (Fig.  10 A). While a majority of the immune cells expressed KLF3/9 genes, a relatively higher distribution was observed for CD4 Tconv cells, CD8 T cells, and NK cells (Fig.  10 B,C,E). Conversely, cells expressing KLF5/9 genes were less prevalent (Fig.  10 D,F).

figure 10

Proportion distribution of single-cell data from KIRC_GSE121636 ( A ). Cellular type distribution in KIRC_GSE121636 ( B ). Cellular distribution of KLF3 ( C ), KLF5 ( D ), KLF9 ( E ), and KLF16 ( F ).

Association of KLF gene mutations with OS and PFS

We employed cBioPortal to assess the correlation of mutations in KLFs with OS and PFS in patients with ccRCC. Figure  11 A shows the genetic alterations in KLFs associated with ccRCC. The KM curves showed that the KLF mutations were significantly related to shorter OS (Fig.  11 C, P  = 4.415e−3) and DFS (Fig.  11 C, P  = 3.571e−3) in patients with ccRCC. Consequently, mutations in KLFs may significantly affect the patient survival prognosis of ccRCC.

figure 11

KLF mutations in ccRCC ( A ). Kaplan–Meier plots for overall survival ( B ) and progression-free survival ( C ) in ccRCC patients with or without KLF mutations.

Association of immune cell infiltration with the KLFs in ccRCC

The TIMER database evaluated the correlation between immune cell infiltration and KLF gene expression. As shown in Fig.  12 A–F, infiltration of B cells, CD8 + T cells, CD4 + T cells, macrophages, neutrophils, and dendritic cells positively correlated with the KLF3/5 expression. Similarly, KLF9 expression positively correlated with the infiltration of CD8 + T cells, CD4 + T cells, macrophages, neutrophils, and dendritic cells, and KLF15 expression positively correlated with the infiltration of CD4 + T cells, neutrophils, and dendritic cells. Conversely, expression of KLF16 negatively correlated with the infiltration of B cells, CD4 + T cells, and dendritic cells. Similarly, KLF17 expression negatively correlated with the infiltration of CD4 + T cells, macrophages, neutrophils, and dendritic cells.

figure 12

Correlation between KLF genes and immune cell infiltration ( A – F ).

Drug sensitivity and related pathways of action in members of KLFs

We employed GSCALite (A total of 481 molecules selected from the CTRP database) to analyze the drug sensitivity of KLFs. Spearman’s coefficient analyzed the correlation between the individual expression of KLF genes and their sensitivity to small molecules. Since the drug sensitivity data for KLF17 was not available in the CTRP database, the respective results were not displayed. The results (Fig.  13 A) revealed that the expression of KLF3/5 was positively correlated with the small molecules/drugs, suggesting that a higher expression of KLF3/5 was associated with a greater resistance to these drugs. Conversely, the expression of KLF15/16 was negatively correlated with small molecules/drugs, suggesting that a higher expression of KLF15/16 was associated with increased sensitivity. However, the association of KLF9 with drug resistance was non-significant.

figure 13

Drug sensitivity analysis of KLFs (CTRP) ( A ). Pathway analysis of KLFs ( B ).

We utilized the pathway activity module in GSCALite to analyze the relevant pathways of KLFs (Fig.  13 B). While KLF3 was associated with the Receptor Tyrosine Kinase (RTK) pathway, KLF9 was involved in the DNA Damage Response pathway, KLF15 in the Hormone ER pathway, KLF16 in the RTK pathway, and KLF5 and KLF17 influenced the Epithelial-Mesenchymal Transition (EMT) pathway.

Cellular experiments verified the phenotypic changes of renal cancer cells after overexpression of KLF5

After overexpression of the KLF5 gene in renal cancer cells 786-O and OS-RC-2, both cells had reduced value-adding (Fig.  14 A) and migration abilities (Fig.  14 B) and were more sensitive to small molecule drug inhibitors, such as our predicted inhibitor of CCT036477 (Fig.  14 C).

figure 14

Cellular assay: renal cancer cell proliferation assay ( A ). Renal cancer cell scratching assay ( B ). Drug sensitivity test of renal cancer cells ( C ).

The KLF family comprises 18 members (KLF1–KLF18) and represents a subfamily of mammalian Sp/KLF zinc-finger proteins. They play crucial roles in transcription through the interactions of their highly conserved DNA-binding domains (DBD) or C-terminal regions with the G/C and CACCC boxes of the target genes 13 . KLFs are involved in tumor cell proliferation, invasion, and metastasis by serving as transcriptional activators or repressors, depending on the type of regulatory proteins they bind 14 . The KLF family proteins are associated with the formation of fat tissue and muscles, nervous system development, tumor formation, cellular and tissue metabolism, and supporting damage repair at cellular, tissue, and systemic levels 15 , 16 . Each KLF gene has a unique structure and is specifically expressed and regulated in different tissues, times, and environments 17 . While the role of some of the KLF genes has been reported in ccRCC, the impact of expression on ccRCC remains unclear 18 . The present study investigated the effects of mRNA expression, methylation, protein expression, gene mutation, and immune infiltration of KFLs on ccRCC.

Under normal physiological conditions, KLF3 acts as a transcriptional repressor and is involved in various cellular processes, including adipocyte differentiation 19 , epidermal differentiation 20 , erythropoiesis, and B-cell development 21 . Previous studies suggest that KLF3 is primarily expressed in CD4Tconv, CD8T, monocytes/macrophages, endothelial, and malignant cells in most tumor microenvironments 22 . KLF3 is aberrantly expressed in various diseases and is involved in related pathways. Therefore, its study would contribute to an in-depth understanding of the pathogenesis of these diseases and will provide new insights and avenues for the treatment of related diseases. In the present study, patients with differential expressions of KLF3 exhibited significantly different survival periods. Specifically, those with a decreased expression showed significantly decreased OS, PFS, and DSS, suggesting its tumor-suppressing role. Additionally, a similar effect of KLF3 was evident in the subgroup analysis of ccRCC. Moreover, Our study suggests KLF3 was associated with a higher immune cell infiltration. Studies have shown that KLF3 expression is closely related to the infiltration of CD4 + T cells, CD8 + T cells, neutrophils, myeloid dendritic cells, mononuclear/macrophages, and endothelial cells in the tumor microenvironment. It is suggested that KLF3 plays an important role in the tumor microenvironment 22 . Therefore, it can be a potential diagnostic target for ccRCC in future studies.

Studies also suggest the ability of KLF5 to modulate the tumor microenvironment 23 . KLF5 regulates the expression of various target genes and participates in cellular functions, including stemness, proliferation, apoptosis, autophagy, and migration 24 . It is an essential transcription factor in cardiovascular remodeling, thereby serving as a potential therapeutic target 25 . In our study, KLF5 was expressed at lower levels in paired and unpaired ccRCC tissue samples collected from the TCGA database. Similarly, it was downregulated in renal cancer cell lines. Moreover, our results confirmed a downregulated KFL5 expression in ccRCC tissue compared with the adjacent tissues. Elevated levels of KFL5 were associated with longer OS, DSS, and PFI in ccRCC, suggesting its role as a tumor suppressor. Additionally, our results revealed a positive correlation between KLF5 expression and immune cell infiltration in ccRCC, suggesting its role in regulating tumor immunity. The potential of immune cells in regulating tumor growth is well-established, and immune cell infiltration around tumors is highly significant. Reports suggest that KLF5 inactivation delays the growth of basal-like breast tumors in a CD8 T cell-dependent manner 26 . Knocking down KLF5 can make tumors sensitive to PD-1 blocking by increasing CD4 + and CD8 + T cells while reducing bone marrow-derived cells 27 . Collectively, these findings suggest KLF5 as a promising prognostic and therapeutic target for patients with ccRCC.

KLF9 is involved in transcriptional regulation and plays a crucial role in various cellular processes such as proliferation, differentiation, and the development of tissues and organs 28 . KLF9 is downregulated in various tumor tissues and cancer cells, thereby regulating cancer cell proliferation and apoptosis 29 . According to a study, KLF9 also participates in tumor cell invasion and metastasis 30 . Our results also reported downregulated levels of KLF9 in non-paired and paired tumor tissues. Moreover, patients with high KLF9 mRNA expression showed a significantly increased OS, PFS, and DSS. Additionally, its expression decreased with an increase in the tumor grade.

KLF15 plays a crucial role in various biological processes, including the regulation of lipid metabolism, plasma corticosteroid transport, inflammatory responses 31 , and inhibition of cardiac hypertrophy 32 . Our results reported downregulated expression of KLF15 in both non-paired and paired tumor tissues and increased OS and DSS with increased KLF15 mRNA levels. This demonstrates the anti-cancer effects of KLF15 and its potential as a discriminatory factor for malignant tumors in ccRCC based on its relative expression levels.

KLF16 regulates the dopaminergic transmission, metabolism, and related endocrinology. Recent studies have shown the involvement of KLF16 in various disease mechanisms, including insulin resistance and hepatic steatosis 33 , oxidative stress, and inflammation 34 . In our study, the mRNA levels of KLF16 were upregulated in non-paired and paired tumor tissues. Furthermore, an upregulated expression of KLF16 was associated with increased OS, DSS, and PFI. Consequently, KLF16 has a significant oncogenic role, and its relative expression levels can be used to identify ccRCC.

Furthermore, research suggests that KLF17 is commonly downregulated in various cancer types, including colorectal, breast, lung, esophageal, hepatocellular, and gastric cancers 35 , 36 , 37 , 38 , 39 , 40 . Our study results revealed an upregulated KLF17 expression in the non-paired tumor tissues. In addition, it was associated with increased OS, DSS, and PFI, contrary to the findings observed in the aforementioned cancers.

We propose that the KLF family genes, especially KLF5, hold significant potential as prospective biomarkers and therapeutic targets for ccRCC, providing crucial implications for the diagnosis and prognosis of this disease.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Dy, G. W., Gore, J. L., Forouzanfar, M. H., Naghavi, M. & Fitzmaurice, C. Global burden of urologic cancers, 1990–2013. Eur. Urol. 71 (3), 437–446 (2017).

Article   PubMed   Google Scholar  

Tian, Y. Q. et al. Trends and risk factors of global incidence, mortality, and disability of genitourinary cancers from 1990 to 2019: Systematic analysis for the global burden of disease study 2019. Front. Public Health 11 , 1119374 (2023).

Article   PubMed   PubMed Central   Google Scholar  

Zi, H. et al. Global, regional, and national burden of kidney, bladder, and prostate cancers and their attributable risk factors, 1990–2019. Mil. Med. Res. 8 (1), 60 (2021).

PubMed   PubMed Central   Google Scholar  

Hsieh, J. J. et al. Renal cell carcinoma. Nat. Rev. Dis. Primers 3 , 17009 (2017).

Janzen, N. K., Kim, H. L., Figlin, R. A. & Belldegrun, A. S. Surveillance after radical or partial nephrectomy for localized renal cell carcinoma and management of recurrent disease. Urol. Clin. North Am. 30 (4), 843–852 (2003).

Black, A. R., Black, J. D. & Azizkhan-Clifford, J. Sp1 and kruppel-like factor family of transcription factors in cell growth regulation and cancer. J. Cell Physiol. 188 (2), 143–160 (2001).

Article   CAS   PubMed   Google Scholar  

Safe, S. & Abdelrahim, M. Sp transcription factor family and its role in cancer. Eur. J. Cancer 41 (16), 2438–2448 (2005).

Narla, G. et al. KLF6, a candidate tumor suppressor gene mutated in prostate cancer. Science 294 (5551), 2563–2566 (2001).

Article   ADS   CAS   PubMed   Google Scholar  

Wei, D., Kanai, M., Huang, S. & Xie, K. Emerging role of KLF4 in human gastrointestinal cancer. Carcinogenesis 27 (1), 23–31 (2006).

Hamik, A. et al. Kruppel-like factor 4 regulates endothelial inflammation. J. Biol. Chem. 282 (18), 13769–13779 (2007).

Mallipattu, S. K. et al. Kruppel-like factor 6 regulates mitochondrial function in the kidney. J. Clin. Invest. 125 (3), 1347–1361 (2015).

Shi, Z. et al. Identification and validation of a novel ferroptotic prognostic genes-based signature of clear cell renal cell carcinoma. Cancers (Basel) 14 (19), 4690 (2022).

Xiong, Q., Ruan, X. Y. & Fang, X. D. Progress on Sp1-like and Kruppel-like factors. Hereditas (Beijing) 32 (6), 531–538 (2010).

Article   CAS   Google Scholar  

Kaczynski, J., Cook, T. & Urrutia, R. Sp1- and Kruppel-like transcription factors. Genome Biol. 4 (2), 206 (2003).

Moore, D. L. et al. KLF family members regulate intrinsic axon regeneration ability. Science 326 (5950), 298–301 (2009).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Wu, Z. & Wang, S. Role of kruppel-like transcription factors in adipogenesis. Dev. Biol. 373 (2), 235–243 (2013).

Pearson, R., Fleetwood, J., Eaton, S., Crossley, M. & Bao, S. Kruppel-like transcription factors: A functional family. Int. J. Biochem. Cell Biol. 40 (10), 1996–2001 (2008).

Rane, M. J., Zhao, Y. & Cai, L. Krupsilonppel-like factors (KLFs) in renal physiology and disease. EBioMedicine 40 , 743–750 (2019).

He, C. et al. Overexpression of Krueppel like factor 3 promotes subcutaneous adipocytes differentiation in goat Capra hircus. Anim. Sci. J. 92 (1), e13514 (2021).

Jones, J. et al. KLF3 mediates epidermal differentiation through the epigenomic writer CBP. iScience 23 (7), 101320 (2020).

Pearson, R. C., Funnell, A. P. & Crossley, M. The mammalian zinc finger transcription factor Kruppel-like factor 3 (KLF3/BKLF). IUBMB Life 63 (2), 86–93 (2011).

Zhu, J. et al. Pan-cancer analysis of Kruppel-like factor 3 and its carcinogenesis in pancreatic cancer. Front. Immunol. 14 , 1167018 (2023).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Wei, R. et al. Ketogenesis attenuates KLF5-dependent production of CXCL12 to overcome the immunosuppressive tumor microenvironment in colorectal cancer. Cancer Res. 82 (8), 1575–1588 (2022).

Luo, Y. & Chen, C. The roles and regulation of the KLF5 transcription factor in cancers. Cancer Sci. 112 (6), 2097–2117 (2021).

Xie, Z. et al. Current knowledge of Krüppel-like factor 5 and vascular remodeling: Providing insights for therapeutic strategies. J. Mol. Cell Biol. 13 (2), 79–90 (2021).

Wu, Q. et al. KLF5 inhibition potentiates anti-PD1 efficacy by enhancing CD8(+) T-cell-dependent antitumor immunity. Theranostics 13 (4), 1381–1400 (2023).

Li, J. et al. Epigenetic and transcriptional control of the epidermal growth factor receptor regulates the tumor immune microenvironment in pancreatic cancer. Cancer Discov. 11 (3), 736–753 (2021).

Avila-Mendoza, J., Subramani, A. & Denver, R. J. Kruppel-like factors 9 and 13 block axon growth by transcriptional repression of key components of the cAMP signaling pathway. Front. Mol. Neurosci. 13 , 602638 (2020).

Simmen, F. A., Su, Y., Xiao, R., Zeng, Z. & Simmen, R. C. The Kruppel-like factor 9 (KLF9) network in HEC-1-A endometrial carcinoma cells suggests the carcinogenic potential of dys-regulated KLF9 expression. Reprod. Biol. Endocrinol. 6 , 41 (2008).

Li, Y. et al. KLF9 suppresses gastric cancer cell invasion and metastasis through transcriptional inhibition of MMP28. FASEB J. 33 (7), 7915–7928 (2019).

Jiang, Z. et al. KLF15 cistromes reveal a hepatocyte pathway governing plasma corticosteroid transport and systemic inflammation. Sci. Adv. 8 (10), eabj2917 (2022).

Leenders, J. J. et al. Regulation of cardiac gene expression by KLF15, a repressor of myocardin activity. J. Biol. Chem. 285 (35), 27449–27456 (2010).

Sun, N. et al. Hepatic Kruppel-like factor 16 (KLF16) targets PPARalpha to improve steatohepatitis and insulin resistance. Gut 70 (11), 2183–2195 (2021).

Xin, Y. et al. Knock out hepatic Kruppel-like factor 16 (KLF16) improve myocardial damage and promoted myocardial protection of myocardial ischemia-reperfusion via anti-oxidative and anti-inflammation effects by TFAM/PPARbeta signal passage. Bioengineered 12 (2), 10219–10231 (2021).

Ali, A. et al. Tumor-suppressive p53 signaling empowers metastatic inhibitor KLF17-dependent transcription to overcome tumorigenesis in non-small cell lung cancer. J. Biol. Chem. 290 (35), 21336–21351 (2015).

Ismail, I. A., Kang, H. S., Lee, H. J., Kim, J. K. & Hong, S. H. DJ-1 upregulates breast cancer cell invasion by repressing KLF17 expression. Br. J. Cancer 110 (5), 1298–1306 (2014).

Ali, A. et al. KLF17 empowers TGF-beta/Smad signaling by targeting Smad3-dependent pathway to suppress tumor growth and metastasis during cancer progression. Cell Death Dis. 6 (3), e1681 (2015).

Jiang, X. et al. Clinical significance and biological role of KLF17 as a tumour suppressor in colorectal cancer. Oncol. Rep. 42 (5), 2117–2129 (2019).

CAS   PubMed   Google Scholar  

Gumireddy, K. et al. KLF17 is a negative regulator of epithelial-mesenchymal transition and metastasis in breast cancer. Nat. Cell Biol. 11 (11), 1297–1304 (2009).

Peng, J. J. et al. Reduced Kruppel-like factor 17 (KLF17) expression correlates with poor survival in patients with gastric cancer. Arch. Med. Res. 45 (5), 394–399 (2014).

Download references

Acknowledgements

The authors would like to thank all the investigators who contributed to this study.

This work was supported by National Natural Science Foundation of China (82060148).

Author information

These authors contributed equally: MengRu Fu and YuanZhuo Du.

Authors and Affiliations

Department of Nephrology, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, 330000, Jiangxi Province, China

MengRu Fu, Fei Liu, Jun Xiao, Li Zhang, Yan Zeng, YuJuan Yang & Yan Yan

Key Laboratory of Urinary System Diseases of Jiangxi Province, Nanchang, China

MengRu Fu, YuanZhuo Du, Fei Liu & Yan Yan

Department of Urology, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, Jiangxi Province, China, 330000

YuanZhuo Du

You can also search for this author in PubMed   Google Scholar

Contributions

M.R.F. and Y.Z.D. designed the study. Y.Y. and J.X. supervised the research. M.R.F. performed the analysis, plotted the graphs. M.R.F. and F.L. conducted the experiment. Y.Z.D. collected clinical specimens. M.R.F. and Y.Z.D. wrote the manuscript. L.Z. Y.Z. and Y.J.Y. edited and reviewed the manuscript. All authors read and accepted the final manuscript.

Corresponding author

Correspondence to Yan Yan .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary legends., supplementary figure 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Fu, M., Du, Y., Liu, F. et al. Prognostic value of KLFs family genes in renal clear cell carcinoma. Sci Rep 14 , 20204 (2024). https://doi.org/10.1038/s41598-024-69892-5

Download citation

Received : 19 February 2024

Accepted : 09 August 2024

Published : 30 August 2024

DOI : https://doi.org/10.1038/s41598-024-69892-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Family genes
  • Survival prognosis
  • Online databases

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

genetic interactions experiments

IMAGES

  1. DNA Helix Research. Concept of Genetic Experiments on Human Biological

    genetic interactions experiments

  2. Overview of the comparison approach. A) Genetic interaction experiments

    genetic interactions experiments

  3. DNA Genome. Genetics Modification Engineering. Baby Genetically

    genetic interactions experiments

  4. Working model a Genetic experiments from Mendenhall et al. show that

    genetic interactions experiments

  5. I love this activity for collecting data on genetic crosses. The first

    genetic interactions experiments

  6. Model depicting synthetic genetic interactions of RNA binding protein

    genetic interactions experiments

VIDEO

  1. unsuccessful experiment: crossbreeding animals with food

  2. Intro to Genetics Lesson & Punnett Square Activity

  3. What is the principle?Parent-child interaction, fun experiments, scientific experiments, parent

  4. Water Density with Laser Experiment #scienceexperiment #shorts

  5. They lived in Europe, and then we came

  6. Unlocking the genetic puzzle: PAMGEN's quest for Malaria solutions

COMMENTS

  1. Epistasis: Gene Interaction and Phenotype Effects

    Bateson and Punnett next performed a set of experiments in peas that also showed gene-gene interaction. The duo opted to use peas because it is relatively easy to perform hybrid crosses with these ...

  2. Exploring genetic interaction manifolds constructed from rich single

    Each individual cell is in effect an independent experiment connecting a genetic perturbation to its transcriptional consequences, allowing hundreds of thousands of parallel measurements (9, 10). It has been suggested that the rich phenotypes enabled by Perturb-seq can be used to better interpret the impact of genetic interactions .

  3. Multiple Genetic Interaction Experiments Provide Complementary

    A) Genetic interaction experiments differ in the phenotypic readout used, the environmental conditions and the laboratory where the experiment was conducted. B) Every network is compared to a common reference, the SGA network . For each of the 1480 genes in SGA that are also present in at least another network, we show which data set considered ...

  4. Using BEAN-counter to quantify genetic interactions from ...

    However, BEAN-counter is not limited to chemical-genetic interaction screening and should be applicable to experiments that follow the same general design principles as those described here and ...

  5. Exploring a Local Genetic Interaction Network Using Evolutionary Replay

    Evolutionary Replay Experiments Reveal the Range and Likelihood of Alternative Outcomes. Previously, we identified a strong positive genetic interaction that arose in a 1,000-generation laboratory-evolved yeast population (Lang et al. 2013).In this population (BYS2-E01 from Lang et al. 2011) a low frequency lineage persisted for hundreds of generations and acquired a KEL1 mutation (P344T) by ...

  6. A new era in functional genomics screens

    Lastly, single-cell genetic interaction mapping enables quantitative models of gene epistasis that reveal classes of genetic interactions that are difficult to measure by growth-based population ...

  7. Exploring a Local Genetic Interaction Network Using ...

    Using "evolutionary replay" experiments, we identified additional mutations that have positive genetic interactions with the kel1-P344T mutation. We replayed the evolution of this population 672 times from six timepoints. We identified 30 populations where the kel1-P344T mutation reached high frequency. We performed whole-genome sequencing on ...

  8. Multiple genetic interaction experiments provide complementary ...

    A genetic interaction is defined as a deviation from the expected phenotype when combining multiple genetic mutations. In Saccharomyces cerevisiae, most genetic interactions are measured under a single phenotype - … Multiple genetic interaction experiments provide complementary information useful for gene function prediction PLoS Comput ...

  9. Exploring a local genetic interaction network using ...

    Understanding how genes interact is a central challenge in biology. Experimental evolution provides a useful, but underutilized, tool for identifying genetic interactions, particularly those that involve non-loss-of-function mutations or mutations in essential genes. We previously identified a strong positive genetic interaction between specific mutations in KEL1 (P344T) and HSL7 (A695fs) that ...

  10. Detecting genetic interactions using parallel evolution in experimental

    Genetic interactions are known to contribute to adaptive evolution , and the data from evolve and re-sequence experiments must contain information about these genetic interactions. To the best of our knowledge, only one study so far has leveraged this type of data to detect epistasis and demonstrate how it affected evolutionary trajectories [ 36 ].

  11. From genetic associations to genes: methods, applications, and

    Hence, developing high-precision enhancer-promoter interaction maps can facilitate gene prioritization [74] (Figure 2 B). ... these experiments up due to the absence of cell- or tissue-specific experimental protocols and the complexity of the experiments. Given that enhancer-gene connection catalogs are far from complete, there is an urgent ...

  12. Genetic interaction networks: better understand to better predict

    A genetic interaction (GI) between two genes generally indicates that the phenotype of a double mutant differs from what is expected from each individual mutant. ... These makers are used in genetic mapping experiments and are assumed to have less probability of interacting together than pairs of genes randomly picked from the genome.

  13. Exploring genetic suppression interactions on a global scale

    Genetic interactions, in which mutations in two different genes combine to generate an unexpected phenotype, may underlie a significant component of trait heritability. ... We also removed suppression interactions derived from high-throughput experiments or dosage interactions in which either the query or the suppressor was overexpressed. The ...

  14. A global genetic interaction network maps a wiring diagram of ...

    We tested most of the ~6000 genes in the yeast Saccharomyces cerevisiae for all possible pairwise genetic interactions, identifying nearly 1 million interactions, including ~550,000 negative and ~350,000 positive interactions, spanning ~90% of all yeast genes. Essential genes were network hubs, displaying five times as many interactions as nonessential genes.

  15. PDF Exploring a local genetic interaction network using ...

    16 Because this genetic interaction is not phenocopied by gene deletion, it was previously 17 unknown. Using "evolutionary replay" experiments we identified additional mutations that have 18 positive genetic interactions with the kel1-P344T mutation. We replayed the evolution of this 19 population 672 times from six timepoints.

  16. Epistasis

    Epistasis is fundamental to the structure and function of genetic pathways and to the evolutionary dynamics of complex genetic systems. High-throughput functional genomics, systems-level ...

  17. Using BEAN-counter to quantify genetic interactions from ...

    The construction of genome-wide mutant collections has enabled high-throughput, high-dimensional quantitative characterization of gene and chemical function, particularly via genetic and chemical-genetic interaction experiments. As the throughput of such experiments increases with improvements in se …

  18. ECD-CDGI: An efficient energy-constrained diffusion model for cancer

    Author summary Cancer has become a major disease threatening human life and health. Cancer usually originates from abnormal gene activities, such as mutations and copy number variations. Mutations in cancer driver genes are crucial for the selective growth of tumor cells. Identifying cancer driver genes is crucial in cancer-related research and treatment strategies, as it helps understand ...

  19. Inferred from Genetic Interaction (IGI)

    2.1 Genetic interactions such as suppression, enhancement, synergistic (synthetic) interactions, etc. 2.2 Co-transfection experiments. 2.3 Expression of one gene affects the phenotype of a mutation in another gene. 3 Use of the With/From Field for IGI. 4 When IGI Should NOT be Used. 5 Quality Control Checks.

  20. Artificial neural network inference analysis identified novel genes and

    This study aims to identify genes, gene interactions, and molecular pathways and processes associated with muscle aging and exercise in older adults that remained undiscovered until now leveraging on an artificial intelligence approach called artificial neural network inference (ANNi). ... Further experiments including gene knockout models are ...

  21. Defining genetic interaction

    Genetic interactions have long been studied in model organisms as a means of identifying functional relationships among genes or their corresponding gene products, with the nature of these relationships depending on the types of interactions (1 -3).Additionally, the extent and nature of genetic interaction are important to theoretical explanations for the selective advantage of sexual ...

  22. Discovering genetic interactions bridging pathways in genome-wide

    Genetic interactions have been reported to underlie phenotypes in a variety of systems, but the extent to which they contribute to complex disease in humans remains unclear. In principle, genome ...

  23. The ALICE experiment: a journey through QCD

    The ALICE physics programme has been extended to cover a broader ensemble of observables related to Quantum Chromodynamics (QCD), the theory of strong interactions. The experiment has studied Pb-Pb, Xe-Xe, p-Pb and pp collisions in the multi-TeV centre of mass energy range, during the Run 1-2 data-taking periods at the LHC (2009-2018).

  24. Efficient gene knockout and genetic interaction screening ...

    Comparing dual-gene knockout studies and identifying synthetic lethal interactions. With the discovery that paralogs are both systematically underrepresented as hits in pooled library screens and ...

  25. Top 10 Replicated Findings from Behavioral Genetics

    Finally, we note that four of the top-10 findings (2, 7, 8 and 9) are about environmental influences rather than genetic influences. By using genetically sensitive designs such as twin studies, behavioral genetics has revealed almost as much about the environment as about genetics. 1. All psychological traits show significant and substantial ...

  26. Analytical Expert (ARD) (m/f/d)

    Major accountabilities:Designing, planning, supporting the execution as well as interpreting and reporting results of scientific experiments for the development and timely supply of drug substances (DS) and drug products (DP) intended for clinical use in late stage development and potential commercialization.Writing & reviewing analytical documents (e.g Analytical procedures, Specifications ...

  27. Human-water interactions associated to cercarial emergence pattern and

    Background Mali is known to be a schistosomiasis-endemic country with a limited supply of clean water. This has forced many communities to rely on open freshwater bodies for many human-water contact (HWC) activities. However, the relationship between contact with these water systems and the level of schistosome infection is currently receiving limited attention. This study assessed human-water ...

  28. Gene-environment interactions and their impact on human health

    We then examine how these two factors can work together to increase disease risk. Fig. 1: Gene × environment (G × E) interactions involve synergy between environmental risk factors and genetic ...

  29. Prognostic value of KLFs family genes in renal clear cell ...

    Protein interaction network maps of 50 neighboring genes associated with KLF mutations were analyzed using STRING with GO and KEGG functional enrichment analyses.