Aggregation of gene regulatory information and knowledge on FAIR principles enables discovery of pathogenic gene regulatory variants
PMC12967215
· 10.1093/bioinformatics/btag013
Gap Declaration
We focused on the cis-effect variants and regulatory regions located within gene bodies or upstream and downstream of the gene. This allowed us to examine the transcription factor binding site-rich promoter and promoter proximal regions, as well as regions close to the 5′ and 3′ UTR, which have been shown to harbor regulatory motifs. Nevertheless, the more distal regions may still participate in gene transcription regulation through chromatin looping and will warrant future investigation using spatial regulatory information. In addition to CHD, we recognize that other developmental disorders such as specific subtypes of epilepsy, autism, attention-deficit/hyperactivity disorder, and neurocognitive disorders are thought to also be attributable in part to mutations in many genes that are developmentally regulated by thousands of regulatory elements. As the information about gene regulation accumulates, we anticipate that the approaches presented will be able to aggregate them at scale, thus enabling discoveries of disease-associated regulatory regions in developmental disorders that affect all human organs, including the brain.
Abstract
Abstract Motivation Methods for sharing gene regulatory information and knowledge on FAIR principles, particularly in the context of tissue-specific gene regulation, remain poorly defined and implemented, hampering discovery and clinical genetic diagnosis. Results We specified FAIR principles for tissue-specific gene regulatory information and knowledge; implemented them by developing a registry of regulatory elements and aggregating FAIR gene regulatory information from several major sources; developed computational tools that utilize these FAIR resources; and demonstrated their utility by associating gene regulatory variants with major subtypes of congenital heart disease. Availability and implementation Variant prioritization infrastructure tools are available in genboree node repositor…
Conclusions / Discussion
4 Discussion We anticipate that the RFC document presented here will provide guidance for aggregating gene regulatory information by the research community. While the continued advances in genomic assays and data collection technologies will require evolution of specific data formats, the three core entities—variants, regulatory elements, and genes—will likely persist. Moreover, we anticipate that the distinction between data, information, and knowledge will continue to be relevant into the future, particularly with the increasing adoption of agentic AI workflows in research and clinical diagnosis. We put the RFC into practice by reusing existing resources and developing new ones as needed. In contrast to genes and variants, for which canonical naming services (HGNC and CAR respectively) existed, canonical naming services for gene regulatory elements were lacking. Addressing this gap, we developed GLR service for lookup and registration of canonical reGL identifiers for gene regulatory elements. Unlike the CAR, which served as a model and source of some components, GLR does not yet provide a service for completely independent registration of new reGLs by the users. We anticipate in…
Keeper Review
The Appreciated Gateway must be evaluated by a human keeper.
Does this declaration represent a genuine open research gap?
Does this declaration represent a genuine open research gap?
Structural Hole
40% bridge
Technique originates in psychology; functional analogues in criminal justice, epidemiology literature are absent.
○ NAUGHT — Open Opportunity
No paper has claimed this gap. Appreciate the opportunity.
Provenance