Synteny – a high throughput web tool to streamline causal gene prioritisation and provide insight into protein function

synteny-–-a-high-throughput-web-tool-to-streamline-causal-gene-prioritisation-and-provide-insight-into-protein-function
Synteny – a high throughput web tool to streamline causal gene prioritisation and provide insight into protein function

Introduction

Advances in omics technologies have revolutionised the study and understanding of human disease. However, the complexity of the data generated by these techniques can obscure key biological insights and limit the ability of basic researchers to determine which findings are most deserving of experimental follow-up.

A major challenge is that omics studies – including case-control studies examining systemic differences in RNA or protein abundance, and correlational studies – frequently identify hundreds of disease-associated genes and many of these remain poorly characterised. Even a comprehensive review of the literature often provides insight into just a small subset of genes1,2,3. Combined with the fact that data analysis approaches like differential expression offer little insight into whether changes are causes or consequences of disease processes, researchers are often drawn to genes with known functions3,4. This ‘streetlight’ bias – searching for answers where the light is brightest rather than venturing into the relative unknown – impedes new breakthroughs and signals an unmet need for robust and unbiased tools to prioritise genes with the strongest causal links to human disease.

One approach that is becoming popular is to integrate omics data with findings from human genetic studies, since disease-associated genetic variants indicate causality and can provide independent support for the role of genes in human disease. Human genetic evidence encompasses several forms of association including genome-wide association studies (GWAS) as well as case-based analysis of rare disease pedigrees. Genetic and phenotypic data are now available across large human cohorts and several studies have been successful in applying such an integrated approach5,6,7. For example, Li et al.6 integrated gene co-expression networks from mice with GWAS data for human plasma lipids to identify regulators of hepatic cholesterol synthesis. The rationale was that genes co-expressed with established regulators of cholesterol synthesis in mice mark the same biological pathway, and, if those genes also map to lipid-associated human loci, they are more likely to be causal. This cross-species integration identified 54 genes with conserved regulatory relationships between mice and humans, meaning that the same network connections (e.g. to HMGCR, a rate-limiting step in cholesterol synthesis) were preserved across species. Of these, 25 genes had not been linked to lipid metabolism previously. In follow-up studies, 15/25 (60%) were transcriptionally regulated in response to changing cholesterol levels in mice and 9/25 (36%) regulated cholesterol levels in AML12 hepatocytes following siRNA knockdown, suggesting that they might provide novel therapeutic targets for dyslipidaemia. Despite these genes possessing strong human genetic evidence linking them to cholesterol synthesis prior to this study, potential therapeutic roles for these genes were only identified once human genetic associations had been analysed together with data from mice. This emphasises how integrating data from humans and model systems – such as that from model organisms and cell culture systems – can yield greater insights than those obtained by studying either in isolation.

However, in the absence of specialist bioinformatic training, the limited ability of basic scientists to access human genetic data remains a considerable challenge. Although many resources – including OpenTargets Genetics8, MARRVEL9 and GeneBass10 – allow users to query GWAS summary statistics for genes one-at-a-time, the significant time investment required to obtain data for multiple genes discourages researchers from querying all genes of interest. Even though several of these resources provide application programming interfaces (APIs) to enable users to acquire data for many genes at once, few basic researchers possess sufficient familiarity with APIs to use them effectively.

To improve the accessibility of human genetic data to basic researchers, we developed Synteny, a user-friendly tool that enables anybody with a list of gene names to leverage large-scale human genetic datasets to rank genes by human disease relevance. To the best of our knowledge, Synteny is the only tool that permits bulk download of human genetic data via a graphical interface. Although other gene prioritisation tools such as DAVID11, PANTHER12 and g: Profiler13 are compatible with bulk data querying, they do so using gene ontology and pathway information rather than human genetic evidence.

Using Synteny, data for several hundred genes from any of 8 common model organisms can be acquired in minutes. These include Drosophila melanogaster (fly), Mus musculus (mouse), Rattus norvegicus (rat), Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), Xenopus laevis/tropicalis (frog), Danio rerio (zebrafish) and Gallus gallus (chicken), enabling use of the tool by a wide variety of researchers. Ortholog mapping is accomplished using the Alliance for Genome Resources database14 for all but Gallus gallus, where we manually curated mappings using the galGal6 assembly and the HUGO Gene Nomenclature Committee Ortholog Database16. Mapping of cross-species orthologs using these resources means that Synteny is compatible with most common gene name aliases. Cross-species orthologs are typically highly conserved in terms of structure and function15 and so this approach enables findings in model organisms to be leveraged to infer potential human disease mechanisms and gene function, and vice-versa.

Next, Synteny acquires human phenome-wide association statistics from the Association to Function (A2F) database (https://a2f.hugeamp.org/). The A2F database is a manually curated collection of GWAS datasets spanning a wide array of phenotypes across multiple disease systems. At the time of publication, A2F contained summary statistics for 898 genetic studies and 1175 phenotypes, including cardiovascular (n = 115), respiratory (n = 126), glycaemic (n = 102), renal (n = 104), neurological (n = 49), immune-related (n = 17) and cancer (n = 19) phenotypes, among others. The data contained within the portal are regularly updated with current GWAS studies and so as more data are added this will enable even deeper interrogation of the human phenome.

Results

Synteny powers improved access to human genetic evidence

While technical barriers once limited the ability to integrate human genetic data into broader biological analyses, recent approaches such as the Human Genetic Evidence (HuGE) framework by Dornbos et al.17 have greatly expanded access to these insights. The main benefit of HuGE scores is that they provide a single interpretable measure of the strength of association between every gene and every phenotype. HuGE scores ≥ 350 are classified as ‘compelling’, ≥ 100 is ‘extreme’, ≥ 30 is ‘very strong’, ≥ 10 is ‘strong’, ≥ 3 is moderate, > 1 is ‘anecdotal’ and relationships with no evidence receive a score ≤ 1.

HuGE scores are calculated using a Bayesian model to account for the effects of rare and common genetic variation, the proximity of a gene to the peak genetic association in genome wide association studies and the impact of coding mutations detected in whole exome sequencing17. By aggregating genetic data across many independent studies, HuGE scores provide an unbiased and robust assessment of how strongly a gene of interest might relate to human physiology. For example, HuGE scores for peroxisome proliferator activated receptor gamma (PPARG), a master-regulator of adipogenesis, show strong associations (HuGE ≥ 30) to body fat percentage and muscle fat infiltration (Fig. 1A), consistent with its established role in adipogenesis. While databases like A2F have made it straightforward to access HuGE scores for individual genes, the restriction on searching genes one-at-a-time is incompatible with high-throughput, hypothesis-generating studies where hundreds of disease-associated genes can be identified in a single omics-based experiment (Fig. 1B).

Synteny overcomes this limitation by enabling bulk querying of the A2F database (Fig. 1C), streamlining the process of integrating human genetic evidence with data from model systems. Its operation is simple, requiring only a list of genes. All results can be exported for analysis in external software and data visualisation plots can be downloaded as high-resolution vector PDFs.

To demonstrate the advantage of this approach, we used Synteny to analyse a list of 52 genes previously assessed by Dornbos et al.17 to assess the use of human genetic evidence by metabolism researchers working with model systems. Lists of 50–200 genes are commonly generated by differential expression, gene set enrichment and clustering analyses, but are typically for manual prioritisation. This represents an important opportunity for human genetic data to reveal phenotypic associations that might otherwise be overlooked. To accommodate the full spectrum of omics outputs, Synteny can also be used to query larger gene sets of up to 600 genes with minimal additional analysis time. The gene set selected by Dornbos et al.17 was manually curated studies from studies published in high profile journals between January 2017 and October 2020 that contained any of “diabetes,” “glucose,” or “insulin” in their abstracts. The links between these genes and the phenotypes of interested were supported by experiments spanning several experimental systems including cell culture, model organisms (e.g. mice and flies) and a small subset of human studies. However, examining the HuGE scores for these genes, Dornbos et al.17 identified a striking lack of genetic evidence to support therapeutic targeting of these genes, highlighting an essential need for increased use of human genetic data. Despite this, it remains challenging for basic researchers to apply this approach to ensure that their research is directed towards genes with the strongest links to human disease.

While manually querying each gene using the A2F portal required more than an hour, using Synteny this process was completed in just 80 s (Fig. 1D). This boost in efficiency makes this approach much more accessible than existing alternatives, enhancing the potential for increased translation.

Investigation of HuGE scores across genes most often reveals well-studied associations. However, the true value of this analysis lies in highlighting relationships that would otherwise have gone unnoticed. For example, while the link between phosphatase and tensin homolog (PTEN) and systemic glucose homeostasis is well-established18,19, the association between carnitine palmitoyltransferase I (CPT1A) and platelet number has not been widely reported previously. CPT1A plays a central role in transporting fatty acids into the mitochondria for oxidation20,21 and, consistent with human genetic evidence, recent studies have revealed that the contribution of fatty acids to ATP production in platelets is likely greater than currently appreciated22,23. Consistent with this relationship, platelet deficiency is a rare, but severe and well-recognised, complication of CPT1A deficiency in humans26. These insights suggest new avenues for exploring platelet pathology and showcase the power of unbiased analyses that are possible using Synteny.

The Human Phenotype Enrichment (HPE) tool in Synteny enables users to quantify the extent to which gene sets of interest are supported by human genetic evidence. Similar to existing over-representation analyses for molecular functions, HPE works by comparing HuGE scores for a user-supplied gene set against those obtained by randomly sampling the genome (or a user-supplied background gene set). Consistent with the findings of Dornbos et al.17 , that their list of 52 genes was overall not supported by human genetic evidence, HPE using Synteny failed to detect significant enrichment for glycaemic traits that would be expected if these genes were supported by human genetic evidence (Fig. 1E). Instead, the 52 genes were enriched for early life Body Mass Index (BMI), Alzheimer’s disease (AD) and age-related macular degeneration (AMD). This outcome supports the conclusions of Dornbos et al.17 , emphasising the importance of interrogating results in the context of human genetics to maximise the extent to which advances in basic research can be translated into human therapeutic breakthroughs. Many existing approaches using human genetic data are designed to test single phenotypes and so here we report both unadjusted and FDR-corrected p values to provide compatibility with these approaches.

To evaluate the sensitivity and robustness of HPE, we performed a series of benchmarking experiments to determine the degree of enrichment required to detect statistical significance. These power analyses were performed using gene sets of varying size between 10 and 300 genes, obtained by weighted sampling of the HuGE score distribution (Fig. 2A: i). Using obesity as an example trait, 90% power was obtained at 4-fold enrichment with 100 genes (Fig. 2A: ii). To evaluate sensitivity more broadly and test for performance bias between different disease contexts, we extended this analysis to 15 randomly sampled human traits from 16 trait categories (240 phenotypes in total). This produced a median recall of 80% power to detect > 1.66-fold enrichment using a gene set containing 100 genes, suggesting that HPE is highly sensitive for phenome-wide enrichment analysis. Haematological traits were an important outlier, indicating that data contained within A2F may be biased against detecting enrichment within this disease context. A potential explanation is the relatively limited availability of large-scale genetic studies for haematological traits compared to other disease areas, perhaps indicating that genetic effects may not be effectively captured by the HuGE score framework. However, broadly speaking, the diversity of phenotypes and the consistency of performance across other major disease areas supports the use of A2F as an unbiased tool for functional assessment.

To assess the robustness of HPE against varying gene set compositions, we used HPE to analyse several systematically curated HALLMARK pathways (Fig. 2C). This identified enrichments that are consistent with known phenotypic associations. For example, the ‘Adipogenesis’ pathway was enriched for adiponectin (an adipokine), body fat percentage and plasma triglyceride (TG) levels; the ‘Pancreatic β cells’ pathway was enriched for fasting glucose and traits related to insulin secretion, a key function of β cells; and the ‘Inflammatory response’ pathway was enriched for Crohn’s disease and coronary artery disease, both well-studied inflammatory conditions. These results indicate that HPE can infer true biological meaning from user-supplied gene sets. Together, these analyses demonstrate that HPE is an effective approach for identifying prospective functions in a user-specified gene set.

HPE aggregates enrichment signals over entire gene sets, and it is therefore important to determine whether findings are driven by a small number of genes or are distributed across entire the gene set. Synteny facilitates this by automatically visualising the contributions of user-supplied genes to an enriched phenotype. For example, while the enrichment for adiponectin in the ‘Adipogenesis’ gene set seems to be primarily driven by the inclusion of the adiponectin (ADIPOQ) gene, the Crohn’s disease enrichment for the ‘Inflammatory response’ pathway is more evenly distributed.

Enabling users to determine which genes contribute most strongly to human traits also addresses a core challenge in omics analysis. Genes belonging to the same pathway are often co-regulated24, and this can obscure which genes causally influence the phenotype. For example, although omics analysis might identify several glycolytic enzymes as hits, it may be unclear which gene/s should be pursued experimentally. This is important since, while there are 12 steps in glycolysis – from glucose import to lactate export – only 4 of these appear to regulate total flux through the pathway25. Selecting genes that do not regulate flux might result in a false negative result in experimental follow-up studies. Synteny addresses this challenge by providing access to the effects of human genetic variation. Because the effects of genetic variation are independent of pathway membership, this enables users to disentangle the contributions of individual genes to human disease and provides a significant advantage compared to gene prioritisation approaches that focus on gene ontology and pathway membership. Applied to the analysis of glycolytic genes expressed in skeletal muscle, Synteny identified very strong genetic evidence (HuGE ≥ 30) for several enzymes that Tanner et al.25 previously showed to regulate total glycolytic flux (Fig. 2D). Thus, Synteny provides a powerful tool to assist users in identifying high-priority genes for mechanistic follow-up.

Fig. 1
figure 1

Synteny powers improved access to human genetic evidence. (A) Human genetic evidence (HuGE) scores aggregate genetic data from many studies to summarise the strength of association between genes and phenotypes17, enabling gene function to be rapidly assessed. Example of HuGE score plot linking PPARG to 1175 phenotypes. (B) Schematic of the current implementation of the HuGE score framework via the Association to Function web portal. (C) Schematic illustrating the high throughput implementation in Synteny. API = application programming interface. Examples of data visualisation and analysis in Synteny for 52 genes hypothesised to be linked to glucose metabolism17(D & E). (D) Heatmap showing associations between genes and human phenotypes, filtered for genes with HuGE scores ≥ 30. (E) Results of HPE. Asterisks represent unadjusted p values.

Full size image

Fig. 2
figure 2

Benchmarking for human phenotype enrichment. (A) Example of sensitivity analysis performed using Obesity as a model trait. Random weighted sampling (i) was used to generate enriched gene sets with different numbers of genes that were then subjected to HPE (ii). (B) Power simulations for 15 randomly sampled phenotypes from 16 trait categories, with gene sets of size n = 100. Each dot is the average of 1000 permutations per phenotype, showing the degree of enrichment required to reach 80% power. Median enrichment for 80% power (x̃) = 1.66-fold. (C) HPE using gene sets from HALLMARK pathways. Genes contributing to the top enrichments for ‘Adipogenesis’, ‘Pancreatic β cells’ and ‘Inflammatory response’ are shown in i-iii, respectively. (D) Heatmap presenting HuGE scores for glycolytic enzymes expressed in skeletal muscle for selected phenotypes. Enzymes that regulate total glycolytic flux25 are highlighted in red.

Full size image

Synteny leverages human genetic data for discovery

Less than half of all human genes have been studied mechanistically4,27,28. To evaluate whether Synteny could be used to infer protein function, we applied it to the solute carrier (SLC) superfamily, containing over 400 genes. We chose this family since, despite its members playing crucial roles in many aspects of biology, the substrates and biological functions of many of these proteins remain unknown29.

Consistent with their roles as transporters, HPE analysis identified enrichment of the SLC family in regulating circulating levels of a variety of metabolites including creatinine, urate and sodium (Fig. 3A). Several human diseases related to dysregulated ion transport, including chronic kidney disease and epilepsy with sclerosis were also enriched.

To identify putative functions for orphan SLC members – SLCs without known substrates – we identified SLCs with similar associations to human traits by correlating HuGE scores across genes. Such guilt-by-association strategies are implemented in several established tools including WGCNA30, GeneBridge28 and QENIE31, all of which assign putative functions to genes by means of correlations or co-expression networks. This identified 6 clusters containing more than 2 SLC proteins (Fig. 3B). Each SLC cluster was enriched for at least one human trait (Fig. 3C). Strikingly, cluster 4 was enriched for genes associating with serum urate levels; 3 of the 5 proteins encoded by genes in this cluster (SLC2A9, SLC22A12, SLC22A13) have previously been shown to actively transport urate32,33,34. This raised the intriguing possibility that the remaining two 2 SLC members in this cluster (SLC5A2 and SLC5A9) may also function as urate transporters.

SLC5A2 encodes the SGLT2 protein, a sodium-dependent glucose transporter highly expressed in the kidney and a well-established target for Type 2 Diabetes35,36. Both genetic and pharmacological evidence support a potential role of SLC5A2 in urate transport, although this has not been directly tested yet. Specifically, a coding variant in SLC5A2 has been linked to uricosuria37, a condition with high urate levels in the urine, and pharmacological inhibition of SGLT2 decreases the re-absorption of urate from the urine38. While it has been suggested that this effect is mediated indirectly through the effects of elevated urinary glucose on urate transport by GLUT9 (SLC2A9)37 and URAT1 (SLC22A12)37,39, studies have shown that GLUT9 is dispensable for the hyperuricosuria38 and the impact of glucose on URAT1 remains unclear. This raises the possibility that SGLT2 may directly mediate urate transport, a hypothesis worthy of future investigation.

SLC5A9 encodes the SGLT4 protein, a sodium-dependent sugar transporter expressed in the small intestine40,41, although little else is known about its function. Combined with the fact that approximately 30% of uric acid is excreted into the small intestine, the known overlap between urate and sugar transporters32 supports the possibility that SLC5A9 could also transport urate. Enhancing intestinal urate disposal has been proposed as a therapeutic target for chronic kidney disease42 and so it will be important for future studies to clarify whether this might be possible via targeting SLC5A9.

These examples highlight how human genetic data can be leveraged to gain insights into protein function and emphasise the utility of Synteny in informing targeted validation experiments.

Fig. 3
figure 3

Synteny leverages human genetic data for discovery. (A) Results of HPE for 413 members of the solute carrier (SLC) superfamily, only possible using Synteny. Asterisks represent unadjusted p values. (B) Correlation heatmap of HuGE scores for SLC genes with more than one R > 0.95 p < 0.01. Clusters contain SLC genes with similar phenotypic associations. (C) Top scoring phenotype for each SLC cluster.

Full size image

Discussion

Combined with existing bioinformatic tools for identifying genes of interest, Synteny provides a systematic and scalable pipeline to maximise the biological insight that can be drawn from omics studies (https://bigproteomics.shinyapps.io/Synteny/).

Human genetic databases contain vast amounts of information and facilitate targeted inspection of individual genes. Synteny complements these resources by helping researchers prioritise where to look, highlighting which genes or pathways are most strongly supported by human genetic evidence. To streamline downstream analysis, Synteny automatically generates hyperlinks to the A2F database for genes and phenotypes of interest. This functionality enables users to transition seamlessly from broad scale discovery to targeted experimental follow-up.

Several new analyses are possible using Synteny. By searching for genes with shared disease associations, it is possible to assign putative functions to proteins. In an analysis of the SLC family, this approach led us to propose SGLT2 and SGLT4 as putative urate transporters. Combined with the observation that both genetic and pharmacological manipulation of SGLT2 influences plasma urate levels37,38, the fact that these genes clustered together with three known urate transporters provides considerable support for the use of human genetic evidence to group genes by their physiological functions.

Although several well-recognised tools exist to perform functional enrichment on user-supplied gene sets, including DAVID11 and g: Profiler13, these tools rely on gene expression, protein interaction or pathway databases that cannot distinguish whether enriched terms represent causes or consequences of the pathology. In contrast, HPE enables users to quantify the strength of causation within gene sets, since human genetic associations are derived from naturally occurring variation that precedes disease onset. While HPE has its own limitations, such as being restricted to genes with detectable variants and potential population-specific biases, it provides an important complementary approach to traditional enrichment methods. Beyond functional annotation, HPE also possesses potential clinical applications. For rare diseases, HPE might reveal related phenotypes that could serve to improve diagnosis or treatment, highlighting the importance of comprehensive phenotype analysis.

As with all tools, however, Synteny is not without limitation. Firstly, analysis of genes from diverse tissues could limit mechanistic insight in cases where gene function is not well characterised. To mitigate this, we suggest users stratify genes by tissue expression prior to HPE. This issue does not apply to many common situations where genes sets are obtained by analysing omics data in a specific tissue. Secondly, Synteny does not currently incorporate information from Mendelian randomisation or protein quantitative trait loci (pQTL) studies linking proteins to disease. These datasets offer an additional form of evidence to aid candidate prioritisation. We encourage users to make use of data from proteogenomics consortia such as SCALLOP43 and the UK BioBank44 to perform these analyses once they have identified high-priority candidates using Synteny. Finally, Synteny does not currently support custom trait-gene mapping using local GWAS data. We encourage interested users to contact the A2F portal to arrange for new study data to be publicly deposited and made broadly available to the community.

Synteny addresses a fundamental challenge in modern biology by bridging the gap between studies in model systems and human genetic studies. By interrogating findings from studies in model systems in the broader context of human genetic evidence, researchers can gain insight into potentially causal genes and streamline the selection of targets for experimental follow-up. This approach also offers considerable benefits to those working with human genetic data. While human genome-wide association studies have identified thousands of disease-associated loci, these loci often contain many genes and so pinpointing the causal gene/s within these loci remains a persistent challenge. By enabling users to cross-reference GWAS candidates with functional evidence from model systems, Synteny helps to prioritise which genes within complex loci are most likely to be causal and should be prioritised for further study.

Methods

Statistical analyses

All analysis and data visualisation were conducted using the R programming environment45. Unless otherwise stated, correlations were assessed by Pearson correlation analysis. Significance is represented, with a p value < 0.05 by *, < 0.01 by **, < 0.001 by *** and < 0.0001 by ****.

Human phenotype enrichment

Human genetic evidence (HuGE) scores were downloaded for each gene. To assess enrichment for a particular phenotype, the same number of genes were randomly selected from the Association to Function database (for naïve enrichment) or randomly from a set of background genes (for background-corrected enrichment). This was repeated for 10,000 permutations to enable the average HuGE score in the user-defined gene set to be compared to the sampled distribution/s. An enrichment score was calculated as the mean fold change of the average scores, and significance was determined by calculating the proportion of sampled scores exceeding that of the user-defined set. Multiple corrections were performed using a false-discovery rate (FDR) correction to account for the number of phenotype tests.

Benchmarking

Sensitivity analysis

Gene sets with progressively greater enrichment were generated by weighted sampling. We performed permutation testing to determine the power of HPE to detect enrichment for gene sets with different levels of enrichment (n = 1000). Power is reported as the proportion of gene sets where enrichment was detected with p < 0.05. Phenotypes in Fig. 2B were obtained by randomly sampling 15 phenotypes per trait category.

HALLMARK analysis

Human HALLMARK gene sets were downloaded from https://www.gsea-msigdb.org/gsea/msigdb/human/genesets.jsp?collection=H and analysed using HPE. Significantly enriched phenotypes are reported.

Synteny

Web server implementation

Synteny is implemented as a Shiny application using the R programming language and is hosted on a shinyapps.io server.

Privacy and security

Gene lists are only uploaded if the user accesses Synteny via the web server. No user data are retained following session termination. Secure HTTPS connections are used to transfer data to and from the server. The country in which each user is located is logged using a google analytics cookie. This data is used solely to monitor global adoption of the tool and ensure server requirements are sufficient to meet demand. No other data is recorded.

Local installation

The code required to run Synteny locally is available in the supplementary materials. After installing the required R packages, Synteny will still require an internet connection to connect to the Association to Function application programming interface. No data will be uploaded to the Synteny shinyapps.io server if running locally.

Packages and databases

Synteny maps gene names from any organism to human orthologs using the Alliance for Genome Resources database14. Human genetic evidence (HuGE) scores are retrieved via an application programming interface connected to the Association to Function database using the jsonlite package46. Static heatmaps are created using the ComplexHeatmap package47 and interactive plots are created using a combination of ggplot248 and plotly49.

Exporting results

Results and figures can be explored and visualised within Synteny itself or exported for analysis in the users’ software of choice. Data is exportable as excel sheets (.xlsx) and figures are exportable as full resolution vector PDFs.

Data availability

Data relating to HuGE scores and phenotypic enrichment (contained within Figs. 1 and 3) are included in this published article (and its Supplementary Information files). Raw datasets generated during the HPE benchmarking experiments (Figure 2) are available from the corresponding author upon reasonable request.

References

  1. Müller, J. B. et al. The proteome landscape of the kingdoms of life. Nat. 2020. 582 (7813 582), 592–596. https://doi.org/10.1038/s41586-020-2402-x (2020).

    Article  Google Scholar 

  2. Hutchison, C. A. et al. Design and synthesis of a minimal bacterial genome. Sci. (1979). 351https://doi.org/10.1126/SCIENCE.AAD6253/SUPPL_FILE/AAD6253-HUTCHISON-SM.PDF (2016).

  3. Kustatscher, G. et al. Understudied proteins: opportunities and challenges for functional proteomics. Nat. Methods. 19, 774–779. https://doi.org/10.1038/S41592-022-01454-X (2022).

    Article  PubMed  Google Scholar 

  4. Edwards, A. M. et al. Too many roads not taken. Nat. 2011. 470 (7333 470), 163–165. https://doi.org/10.1038/470163a (2011).

    Article  Google Scholar 

  5. Keller, M. P. et al. Gene loci associated with insulin secretion in Islets from nondiabetic mice. J. Clin. Invest.129, 4419–4432. https://doi.org/10.1172/JCI129143 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Li, Z. et al. Integrating mouse and human genetic data to move beyond GWAS and identify causal genes in cholesterol metabolism. Cell. Metab.31, 741–754e5. https://doi.org/10.1016/j.cmet.2020.02.015 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Parker, B. L. et al. An integrative systems genetic analysis of mammalian lipid metabolism. Nature567, 187. https://doi.org/10.1038/S41586-019-0984-Y (2019).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  8. Mountjoy, E. et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat. Genet.53, 1527–1533. https://doi.org/10.1038/S41588-021-00945-5 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Wang, J. et al. MARRVEL: integration of human and model organism genetic resources to facilitate functional annotation of the human genome. Am. J. Hum. Genet.100, 843–853. https://doi.org/10.1016/J.AJHG.2017.04.010 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Karczewski, K. J. et al. Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes. Cell genomics 2. (2022). https://doi.org/10.1016/J.XGEN.2022.100168

  11. Sherman, B. T. et al. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res.50, W216–W221. https://doi.org/10.1093/NAR/GKAC194 (2022).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  12. Mi, H., Muruganujan, A., Casagrande, J. T. & Thomas, P. D. Large-scale gene function analysis with PANTHER classification system. Nat. Protoc.8, 1551. https://doi.org/10.1038/NPROT.2013.092 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Kolberg, L. et al. g:Profiler—interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Res.51, W207–W212. https://doi.org/10.1093/NAR/GKAD347 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Bult, C. J. & Sternberg, P. W. The alliance of genome resources: transforming comparative genomics. Mamm. Genome. 34, 531. https://doi.org/10.1007/S00335-023-10015-2 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nature Communications 2015 6:1 6, 1–9. (2015). https://doi.org/10.1038/ncomms6890

  16. Eyre, T. A., Wright, M. W., Lush, M. J. & Bruford, E. A. HCOP: a searchable database of human orthology predictions. Brief. Bioinform. 8, 2–5. https://doi.org/10.1093/BIB/BBL030 (2007).

    Article  PubMed  Google Scholar 

  17. Dornbos, P. et al. Evaluating human genetic support for hypothesized metabolic disease genes. Cell. Metab.34, 661–666. https://doi.org/10.1016/j.cmet.2022.03.011 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Pal, A. et al. PTEN mutations as a cause of constitutive insulin sensitivity and obesity. N. Engl. J. Med.367, 1002–1011. https://doi.org/10.1056/NEJMOA1113966/SUPPL_FILE/NEJMOA1113966_DISCLOSURES.PDF (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Wijesekara, N. et al. Muscle-specific Pten deletion protects against insulin resistance and diabetes. Mol. Cell. Biol.25, 1135–1145. https://doi.org/10.1128/MCB.25.3.1135-1145.2005 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  20. McGarry, J. D., Leatherman, G. F. & Foster, D. W. Carnitine palmitoyltransferase I. The site of Inhibition of hepatic fatty acid oxidation by malonyl-CoA. J. Biol. Chem.253, 4128–4136. https://doi.org/10.1016/S0021-9258(17)34693-8 (1978).

    Article  PubMed  Google Scholar 

  21. Schlaepfer, I. R. & Joshi, M. CPT1A-mediated Fat Oxidation, Mechanisms, and Therapeutic Potential. Endocrinology 161. (2020). https://doi.org/10.1210/ENDOCR/BQZ046

  22. Kulkarni, P. P. et al. Fatty acid oxidation fuels agonist-induced platelet activation and thrombus formation: Targeting β-oxidation of fatty acids as an effective anti-platelet strategy. FASEB J 37. (2023). https://doi.org/10.1096/FJ.202201321RR

  23. Kulkarni, P. P., Ekhlak, M. & Dash, D. Energy metabolism in platelets fuels thrombus formation: halting the thrombosis engine with small-molecule modulators of platelet metabolism. Metabolism145, 155596. https://doi.org/10.1016/J.METABOL.2023.155596 (2023).

    Article  PubMed  Google Scholar 

  24. Öztürk, M. et al. Proteome effects of genome-wide single gene perturbations. Nat. Commun. 2022. 13 (1 13), 1–10. https://doi.org/10.1038/s41467-022-33814-8 (2022).

    Article  Google Scholar 

  25. Tanner, L. B. et al. Four key steps control glycolytic flux in mammalian cells. Cell. Syst.7, 49–62e8. https://doi.org/10.1016/J.CELS.2018.06.003 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Lee, K., Pritchard, A. & Ahmad, A. Carnitine Palmitoyltransferase 1A Deficiency. GeneReviews®. (2025).

  27. Stoeger, T., Gerlach, M., Morimoto, R. I. & Nunes Amaral, L. A. Large-scale investigation of the reasons why potentially important genes are ignored. PLoS Biol.16, e2006643. https://doi.org/10.1371/JOURNAL.PBIO.2006643 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Li, H. et al. Identifying gene function and module connections by the integration of multispecies expression compendia. Genome Res.29, 2034–2045. https://doi.org/10.1101/GR.251983.119/-/DC1 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Superti-Furga, G. et al. The RESOLUTE consortium: unlocking SLC transporters for drug discovery. Nat. Reviews Drug Discovery 2020. 19 (7 19), 429–430. https://doi.org/10.1038/d41573-020-00056-6 (2020).

    Article  Google Scholar 

  30. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform.9, 559. https://doi.org/10.1186/1471-2105-9-559 (2008).

    Article  Google Scholar 

  31. Seldin, M. M. et al. A strategy for discovery of endocrine interactions with application to Whole-Body metabolism. Cell. Metab.27, 1138–1155e6. https://doi.org/10.1016/J.CMET.2018.03.015 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Vitart, V. et al. SLC2A9 is a newly identified urate transporter influencing serum urate concentration, urate excretion and gout. Nat. Genet.40, 437–442. https://doi.org/10.1038/NG.106 (2008).

    Article  PubMed  Google Scholar 

  33. Dai, Y. & Lee, C. H. Transport mechanism and structural pharmacology of human urate transporter URAT1. Cell Research 2024 34:11 34, 776–787. (2024). https://doi.org/10.1038/s41422-024-01023-1

  34. Bahn, A. et al. Identification of a new urate and high affinity nicotinate Transporter, hOAT10 (SLC22A13) *. (2008). https://doi.org/10.1074/jbc.M800737200

  35. Ferrannini, E. & Solini, A. SGLT2 inhibition in diabetes mellitus: rationale and clinical prospects. Nature Reviews Endocrinology 2012 8:8 8, 495–502. (2012). https://doi.org/10.1038/nrendo.2011.243

  36. Chao, E. C. & Henry, R. R. SGLT2 inhibition — a novel strategy for diabetes treatment. Nat. Reviews Drug Discovery 2010. 9, 7. https://doi.org/10.1038/nrd3180 (2010).

    Article  Google Scholar 

  37. Inthasot, S. et al. A novel heterozygous likely pathogenic SLC5A2 variant in a diabetic patient with glucosuria and aminoaciduria. Endocrinol Diabetes Metab Case Rep 2024. (2024). https://doi.org/10.1530/EDM-24-0065

  38. Novikov, A. et al. SGLT2 Inhibition and renal urate excretion: role of luminal glucose, GLUT9, and URAT1. Am. J. Physiol. Ren. Physiol.316, F173–F185. https://doi.org/10.1152/AJPRENAL.00462.2018 (2019).

    Article  Google Scholar 

  39. Dong, M. et al. The mechanism of Sodium-Glucose Cotransporter-2 inhibitors in reducing uric acid in type 2 diabetes mellitus. Diabetes Metabolic Syndrome Obes.16, 437–445. https://doi.org/10.2147/DMSO.S399343 (2023).

    Article  Google Scholar 

  40. Tazawa, S. et al. SLC5A9/SGLT4, a new Na+-dependent glucose transporter, is an essential transporter for mannose, 1,5-anhydro-D-glucitol, and Fructose. Life Sci.76, 1039–1050. https://doi.org/10.1016/J.LFS.2004.10.016 (2005).

    Article  PubMed  Google Scholar 

  41. Sano, R., Shinozaki, Y. & Ohta, T. Sodium–glucose cotransporters: functional properties and pharmaceutical potential. J. Diabetes Investig. 11, 770. https://doi.org/10.1111/JDI.13255 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Johnson, R. J. Intestinal hyperuricemia as a driving mechanism for CKD. Am. J. Kidney Dis.81, 127–130. https://doi.org/10.1053/j.ajkd.2022.08.001 (2023).

    Article  PubMed  Google Scholar 

  43. Zhao, J. H. et al. Genetics of Circulating inflammatory proteins identifies drivers of immune-mediated disease risk and therapeutic targets. Nat. Immunol. 2023. 24 (9 24), 1540–1551. https://doi.org/10.1038/s41590-023-01588-w (2023).

    Article  Google Scholar 

  44. Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK biobank. Nat. 2023. 622 (7982 622), 329–338. https://doi.org/10.1038/s41586-023-06592-6 (2023).

    Article  Google Scholar 

  45. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria., URL (2021). https://www.R-project.org/

  46. Ooms, J. A simple and robust JSON parser and generator for R [R package Jsonlite version 1.8.9]. CRAN: Contributed Packages. (2024). https://doi.org/10.32614/CRAN.PACKAGE.JSONLITE

  47. Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics32, 2847–2849. https://doi.org/10.1093/BIOINFORMATICS/BTW313 (2016).

    Article  PubMed  Google Scholar 

  48. Wickham, H. ggplot2. (2016). https://doi.org/10.1007/978-3-319-24277-4

  49. Sievert, C. Interactive Web-Based data visualization with R, Plotly, and Shiny. J. R Stat. Soc. Ser. Stat. Soc.184, 1150–1150. https://doi.org/10.1111/RSSA.12692 (2021).

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by an Australian Research Council Laureate Fellowship (to DEJ) and an Australian Government Research Training Program Scholarship (to HBC). The content is solely the responsibility of the authors and does not necessarily represent the official views of the ARC.We would like to thank several colleagues for their constructive feedback: Marcus Seldin for suggestions relating to enabling broader use of the platform by incorporating the Alliance for Genome Resources database; Alice Williamson for suggestions relating to heatmap visualisation settings and improving the documentation for HuGE correlation analysis; Rob Williams for suggestions relating to the inclusion of metadata in exported results; and Mark Keller for suggestions relating to using Synteny with large (> 500) numbers of genes. These conversations greatly improved the user experience of Synteny. We welcome future suggestions from the community.

Funding

This work was funded by an Australian Research Council Laureate Fellowship to DEJ (Project ID FL200100096).

Author information

Authors and Affiliations

  1. School of Life and Environmental Sciences, University of Sydney, Camperdown, NSW, Australia

    Harry B. Cutler, Jacqueline Stöckli, Søren Madsen, Stewart W.C. Masson, Oliver K. Fuller & David E. James

  2. Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia

    Harry B. Cutler, Jacqueline Stöckli, Søren Madsen, Stewart W.C. Masson, Oliver K. Fuller, Tom Buckingham Shum & David E. James

  3. School of Medical Sciences, University of Sydney, Camperdown, NSW, Australia

    David E. James

Authors

  1. Harry B. Cutler
  2. Jacqueline Stöckli
  3. Søren Madsen
  4. Stewart W.C. Masson
  5. Oliver K. Fuller
  6. Tom Buckingham Shum
  7. David E. James

Contributions

Conceptualisation, HBC, JS, SM, SWCM, OKF, DEJ; Methodology, HBC; Formal Analysis, HBC; Data curation, HBC; Writing – Original Draft, HBC; Writing – Review & Editing, HBC, JS, SM, SWCM, OKF, DEJ; Visualisation, HBC, TBS; Supervision, DEJ; Funding Acquisition, DEJ.

Corresponding author

Correspondence to David E. James.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cutler, H.B., Stöckli, J., Madsen, S. et al. Synteny – a high throughput web tool to streamline causal gene prioritisation and provide insight into protein function. Sci Rep 15, 44761 (2025). https://doi.org/10.1038/s41598-025-28473-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41598-025-28473-w

Keywords