New Genome Sequences Reveal Undescribed African Migration
An analysis of the genomes of people from 50 ethnolinguistic groups in Africa spots 62 genes under positive selection and 3 million more genetic variants than previously documented.
lack of ethnic diversity in global genome databases has long been a source of discussion in the scientific community. Africans in particular are underrepresented in these datasets, limiting the conclusions that can be drawn about human health and disease on the continent. Now, groups are looking to buck the trend by diversifying genomics research.
The Bantu migration is this major migration of languages across the continent, and so being able to fill in that part of human history and migration is also a big step forward.—Neil Hanchard, Baylor College of Medicine
A new study published in Nature yesterday (October 28) and conducted through Human Heredity and Health in Africa (H3Africa), a consortium devoted to increasing African representation in genetics research, uncovered 3 million new genetic variants in one of the most extensive studies of African genomes reported to date.
The research team performed whole-genome sequencing analyses of 426 individuals that represent 50 ethnolinguistic groups, including previously unsampled populations, to explore the breadth of genomic diversity across Africa.
The Scientist spoke with University of the Witwatersrand geneticists Zané Lombard and Ananyo Choudhury and Baylor College of Medicine geneticist Neil Hanchard, authors of the study, about the results’ most pressing takeaways.
The Scientist:Why is it so crucial to capture the genetic diversity found on the continent?
Zané Lombard: We can ask interesting questions about population migration and population history from a genomics perspective if we have data on different ethnolinguistic groups. There’s the population genetics aspect, but I think a lot of us are also interested in health research and precision medicine, and from that aspect, the more you know about the populations that you work in and know more about the genomic diversity in patient populations as well as research participants, it informs the kind of conclusions that you can draw from the data and from the types of information that you pull. It really just informs and gives you a context from which to understand the genomics of African participants.
TS:Can you walk me through how you conducted your study?
ZL: It really started off with engaging with different principal investigators and participants across the [H3Africa] consortium and asking whether there were DNA samples available that were properly consented for this kind of research we would be doing—whole genome sequencing and depositing our data into a public repository of data.
We aggregated what we had available, and funding was provided by the [National Human Genome Research Institute] to have the sequencing done at the Baylor Sequencing Center. After that, we got together a group of African scientists from the consortium who then worked on different aspects of the data analysis.
Neil Hanchard: We were very deliberate in trying to look for populations that hadn’t been surveyed previously. That is one of the big drivers for some other more interesting results that we found.
TS:Give me an overview of what you found.
Ananyo Choudhury: The first interesting thing that we discovered was around three million novel variants in these [approximately] three hundred samples, which was quite substantial. We always knew that there is a lot of variation in Africans, but there have been quite a few genome sequencing projects in the last three to four years, so we thought it was getting to saturation. One of our questions was: Are we exhausting that possibility of discovering novel variants? Our study kind of shows that’s not the case. There is still a lot of potentials if you are looking into geographic areas that have been underrepresented traditionally—you can still discover a lot of novel variants.
In terms of population genetics, we discovered a migration that was never known. It was a trans-Sahelian migration that moved into central Nigeria. The timing of the migration into Nigeria didn’t correspond to any of the previously known migrations. We also kind of identified the root of the Bantu migration to southern Africa—that’s a very hotly debated topic not only in genetics but also in linguistics.
NH: The finding of that trans-Sahelian migration into Nigeria was particularly interesting because Nigeria has been very well sampled at the genome level, and here we were finding a fairly large Nigerian group that had marked differences from the traditional groups that have been utilized. Those groups are often utilized as proxies for West Africa or sometimes for all of Africa, but they may not even be great proxies for Nigeria. The Bantu migration is this major migration of languages across the continent and so being able to fill in that part of human history and migration is also a big step forward.
ZL: From health information and point of view, we also showed that there were 62 new loci that we found to be under positive selection and that gives you an idea of how our genome interacts with environments and how environmental forces like viral infections, et cetera, can drive genomic variation. One other thing that I think is really fascinating is variants that were previously shown to be likely pathogenic were actually observed quite commonly in some of the population groups that we looked at. What this tells us is that those variants, because of how frequently they occur in these populations, are probably are not having the kind of pathogenic impact that they previously were predicted to have. That kind of information helps us to better inform how we translate this information into health-related and disease-related information.
TS:In your opinion, what was the most surprising finding of your study?
NH: It was really the breadth of diversity. I think that oftentimes in the field there’s this view of Africa in a very singular way. It was really heartening to see the breadth of diversity across the continent and how genomic diversity interplays with things like ancestry and culture and really shows the rich tapestry that’s across the continent.
TS:How might these findings change how researchers think about biomedical research in African populations?
ZL: I think it helps inform precision medicine and medical decision-making. From a medical perspective, for instance, if you find a variant in a patient with a very rare disease and from a computational point of view it’s predicted to be pathogenic or disease-causing but you don’t have any data out there to confirm this, adding data like frequency data really helps clinicians to make informed decisions.
AC: In Nigeria, if you travel 500 kilometers, there are two populations who have very distinct genetic makeups. [We had a similar finding in] Uganda where you have two populations, and you can see from the genetic data that they have very distinct susceptibility to disease. So, putting together a very granular map like a hyper-resolution genetic map as well as a phenotype map is important. [Our findings show that] you cannot just take a country as proxy and say, “Okay, we are done.”
NH: One of the really important things to convey is that this was work done in Africa by Africans. We collaborated with individual groups that are in the US or the UK, but this [study] is really rooted in Africa.