All humanity in a genome, article by Salvador Macip

The first scientific revolution of the 21st century was genetics, marked by the announcement, the summer of 2000, that all human DNA had been read. This gave way to what has become known as the postgenomic erain which advances in many fields have been exponential thanks to being able to use this as a foundation. first draft of the human genome.

Because it was this, a draft. The information, available as of 2001, was not complete, but politically a coup was needed to terminate the Human Genome Project, which had been devised in the 1980s and started in 1990, especially to protect data from private initiative that, in parallel, was advancing at full speed. In April 2003 it was said that, now, yes, we already had the genome completely read and the project could be officially considered finished.

But it wasn’t very good either. In public databases there was most of the sequence of units that make up a genome, yes, but it could not be considered (almost) complete until the 2017 version. And even then there were substantial holes to fill, due to physical problems (DNA is structured in the form of chromosomes, the center and tips of which are very difficult to read, and some areas are inaccessible, because of how they are ‘packaged’ ) or purely technical (parts of the genome have many repetitions, even of entire genes, which also makes reading difficult). It was not until a few weeks ago that, thanks to technological advances in recent years, it has been possible to finish the 8% of the genome that was not availableplus the entire Y chromosome, which was the other important part that was missing from the beginning.

There is one detail that must be taken into account. We speak of the human genome as if it were information shared by all individuals of the species, but we know that, precisely, what makes us unique is our DNA. So, whose information is in the databases? From no one: it is what is called a reference genome (codename GRCCh38), assembled from anonymous donor samples, both male and female. 93% of the information in this genome puzzle comes from just eleven donors (70%, in fact, from one person). The result, of course, is biased. Specifically, to Caucasians in upstate New York, which is where most of the volunteers came from.

This is a problem because, apart from personal differences, our genomes have similarities between individuals in what we call ethnic groups (an updated and more convoluted definition of the obsolete concept of ‘race’). Therefore, if we are using the published genome as a reference for biomedical studies, we probably we are missing important details, due to our diversity. For example, in 2018, a study that analyzed the DNA of donors of African origin discovered a region of the genome of 300 million units that did not match anything known: in the databases, this ethnic variation simply did not exist.

Related news

This Eurocentric bias has practical implications. Imagine that we use genetic information to decide which drug will work best for a patient. This is what is called personalized medicine (It could be considered another scientific revolution of the 21st century, still in its infancy), and it will be increasingly present, both in primary care and in hospitals. If the data we have to compare are based on Caucasian genome samples, it is unlikely that the conclusions will be the same in Africa or Asia, where ethnic groups have their own peculiarities different from those of Europe.

The challenge initiated with the Human Genome Project, then, it’s not finished yet. Now we should add to the reference as many variants as possible, a type of ‘pangenome’ that serves as a point of comparison in any corner of the world. There is already an initiative that has been launched with the aim of read the DNA of 350 people of different ethnicities. It won’t be easy. But it is even more complicated to understand all the information that we obtain from these genomic projects. Knowing how to spell a word does not mean that we automatically understand the meaning. We still have to discover the functions of many genes, how they are activated and deactivated, how they interact with each other, how they are regulated… A job that will surely we will be completing over the next decades, until we learn all the mysteries that are hidden in the genes. What we will be able to do with this immense power is another story.

ttn-24