Michael J. Fox once said: “Family is not an important thing. It’s everything.” In our new article in Nature, we agree and argue that collecting family data – siblings or parents and their children – is the key to unlocking vital questions such as why intergenerational inequalities perpetuate to why diseases persist within families.
For years, biobanks have greatly expanded our knowledge of health and disease through extensive collections of genetic data from individuals. Yet, as these resources grow to include millions of participants, the insights derived from treating each person as an isolated data point are beginning to plateau. By involving family members in these studies, we can significantly enhance data quality, opening up possibilities for breakthroughs that standard population-based methods might overlook. Such a shift promises significant advances across fields, from social sciences to population genetics and biology.
As my colleague Neil Davies, Professor of Medical Statistics at University College London and lead author, notes, "Family-based biobanks offer some of the most compelling evidence about the causes of physical and mental health. By collecting data from entire families rather than individuals alone, we gain a clearer picture of the links between genetics, environment, and disease—insights that could shape more effective interventions and treatments."
Family-based sampling indeed adds complexity to our analyses, but we believe the benefits far outweigh these challenges. Sampling siblings or parent-child pairs, for example, allows us to provide more reliable insights than traditional population-based samples. It can help overcome confounding in genetic research by exploiting the natural experiment of the randomised transmission of genetic material within families, thereby clarifying true genetic effects. For instance, estimates of genetic effects from family-based genome-wide association studies (GWASs) are significantly smaller than those derived from population-based samples for several traits, including depression and educational attainment.
It also enables the collection and analysis of biological and phenotypic information spanning multiple generations, which can allow scholars to understand the impacts of families and how relatives influence one another.
Thirdly, family-based samples can improve data quality and can shed new light on fundamental questions across various fields that are difficult to explore using population-based samples. These topics range from the effects of sociodemographic factors on health and wellbeing to biological issues such as the frequency of genetic mutations and the rate of recombination.
Costs are also an aspect we have considered. Professor Matthew Keller, Director of the Institute for Behavioral Genetics at the University of Colorado Boulder and senior author of the article, stresses that while family data may be underutilised, its transformative potential is immense: “This paper calls attention to the insufficient family data in current biobanks,” he remarks. “The costs of collecting family data are minimal compared to the substantial benefits. Including family data could reshape our understanding of mental health conditions and address long-standing limitations in clinical and social science disciplines.”
Finally, our article also touches on the practical aspects of implementing family-based biobanks, from recruiting households to linking biobank data with existing administrative records. We believe that these strategies will be essential in meeting the ambitious goals we’ve outlined and paving the way for a deeper, more comprehensive understanding of health and disease across generations.
Acknowledgements:
This article, originally published as 'The Transformative Potential of Family-Based Biobanks' on October 28, 2024, by the Leverhulme Centre for Demographic Science, is republished here with minor adjustments.
This publication is part of the Mapineq project, which has received funding from the European Union’s Horizon Europe research and innovation programme under the grant agreement No. 101061645 (www.mapineq.eu). Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Research Executive Agency (REA). Neither the European Union nor the granting authority can be held responsible for them.
Read more about the Mapineq project here and follow the project on X and LinkedIn. You can also register to receive Mapineq's bi-annual newsletter here.