Big data has acted as a major disruptor for many industries, including such complex and highly sophisticated fields as genomics and genetic research. A decade ago a geneticist job had mostly involved lab work; now, it consists of operating computer software by 90%. Working with algorithms and data processing software has become an important part of genetic and genomic research. Having said that, here at VARTEQ we believe that the potential of big data in genetics and genomics still has to be fully unleashed.
In this article, we will explore how big data and machine learning are transforming genetics, talk about the future of big data in genetic research and genomics as well as the obstacles that stand in the way to fully living up to the promise of using big data for personalized medicine.
Big Data in Genomics
Next-generation genomic sequencing is a field where big data, coupled with the use of cloud technologies, is extensively applied to understand the human genome. The amount of data we are talking about is immense: a single sequencing of the human genome contains over 200 gigabytes of raw data. Obviously, processing this data requires complex AI algorithms, as well as robust storage and computing capacities.
Genome sequencing is important for a number of reasons:
– Deeper scientific insights into what genes really are, including an understanding of a so-called “junk DNA”, labeled as such only because we do not yet know what it does
– Understanding of how genes work together to direct the life, development, and growth of an entire organism
– Advancement of gene-editing technologies, such as CRISPR
Yes, the technology enabling us to add, remove or change the genetic material at a certain location within the genome already exists, although it is still far from becoming mainstream. It has been tested on plants – currently, the application of CRISPR enables farmers to produce more sustainable grain cultures, with higher nutrient density.
As of today, clinical-grade genomic software is increasingly applied to correctly detect and categorize gene variations associated with a number of cancers and hereditary disorders. It’s also used to detect the presence of SARS-COV-2 within the cells, as well as to detect and research its variations.
Big Data in Genetic Research
Unlike genomics, genetic research is focusing on studying particular genes, rather than sequencing the entire genome. The marriage between genetic research and data mining leads to a deeper understanding of diseases and the individual’s predisposition to contract or develop a particular disease. As such, genetic testing aimed to spill light on hereditary diseases as well as personal genetic ancestry is now a growing trend.
At VARTEQ, for example, we supported our customer in building a commercial platform that includes test ordering, billing, insurance coverage, genetic counseling, and reporting of results. We also assisted our client in developing the back-end for its unique genetic tests analytical tool. This tool analyzes and interprets extensive information about known inherited diseases, including related mutations, frequency across populations, and the penetrance and expressivity of genetic changes.
A Peek into the Future
Next-generation genomics and data-driven genetic research have the potential to improve the efficiency of health diagnostics and treatments. Finding cures to cancer, Parkinson’s, and Alzheimer’s is only a matter of time if this technology keeps developing at its current pace.
Access to the information about the unique genetic profile of an individual opens the doorway to personalized medicine. In other words, we could be just one step apart from customized treatments and individual medication dosages, reducing the risks of drug resistance and side effects.
As far as human gene editing is concerned, we are also one step apart from the technology that will help us eliminate hereditary diseases and pass better, healthier genes to our children. As of today, the technology is mature enough to be used for editing animal genes but still has to be refined for use on humans. The effects could bring drastic changes to healthcare as we know it; there are also concerns that it may be unlawfully used to create smarter and more physically fit “designer babies”.
Big Healthcare Data: Challenges and Obstacles
The North American digital genome market is expected to reach US$ 9,594.71 million by 2027. While the stage for the market growth is set by robust investments into the development of genomics, the prevalence of chronic diseases in the region, and the demand for personalized medicine, there are at least three factors that hamper its growth.
The data challenge
A fully sequenced human genome carries about 200 gigabytes of data, plus 100 more gigabytes if fully processed and analyzed. On top of that, big data analytics require storage and computing capacities. Currently, the advancement of cloud technologies is helping to overcome this challenge, so that clinics no longer have to think about accommodating on-premise servers.
Data security concerns
Access to a wealth of genetic data raises rightful privacy and security concerns. The theft or misuse of this type of data may lead to irreversible consequences of which we may not be aware yet at this point. However, steps are being taken to develop unified security standards in genomic data sharing on a global scale. The security of the cloud environment, increasingly used for storing and processing genomic data is also raising concerns.
Lack of expertise
It’s not just a lack of data scientists – healthcare companies looking to develop their own solutions for genetic research are faced with the task of forging effective tech partnerships. Building solutions for genetic research is a niche industry requiring very specific knowledge and expertise. With this regard, choosing a reliable partner to assist them with creating and implementing genetic research platforms can become a challenge.
Looking for a tech-partner with proven expertise in building genetic research solutions? Contact us now for a free consultation!