CRISPR-Cas9, a powerful gene-editing tool, stands out among DNA editors for its efficiency and potential. Still, there's an increasing amount of controversy and interest surrounding the accuracy and safety of this DNA-altering technology.
The basic idea behind CRISPR, which stands for "clustered regularly interspaced short palindromic repeats," is to alter a sequence of DNA to achieve a goal -- like fixing a harmful mutation.
But even in a system that prioritizes precision, CRISPR can still yield mistakes. Now, using the power of machine learning, James Zou, PhD, assistant professor of biomedical data science, and collaborators have created an algorithm that predicts what type of mistakes are likely to occur during CRISPR editing.
A paper detailing the work appears in Nature Biotechnology. Zou is the senior author. This work is a collaboration with researchers at the Chan Zuckerberg Biohub and the University of California, San Francisco.
During a CRISPR-based edit, a strand of molecules called a guide RNA leads the DNA-slicing protein Cas9 to the section of DNA targeted for editing. Once the guide RNA binds to the DNA, Cas9 makes the cut so that new DNA can be inserted or deleted.
In this way, CRISPR is often likened to editing a Word document, cutting out letters or phrases and pasting in new text. But in reality, it's messier than that. It's more like editing a word document with your eyes closed. You know what you want to edit and where you want to edit, but there's a risk for typos -- some of which might not change the meaning of the sentence, while some might.
The same is true for gene editing using CRISPR. Cutting DNA or inserting new genetic material can trigger new unintended edits -- sometimes nucleotides (the building blocks of DNA) are lost, other times they're inexplicably tacked on. This can pose a danger to the cell, and even the organism overall.
Whether these unexpected changes will occur and whether they will be harmful is still difficult to determine -- and that's where Zou hopes his machine learning algorithm, called CRISPR Repair Outcome, or SPROUT, will come in.
"There could be quite a bit of randomness in what happens during these CRISPR edits, and that randomness can potentially create unexpected outcomes," said Zou. "So our work is motivated by whether we can quantify those odds more precisely."
Some sections of DNA are more prone to error than others -- it depends on the sequence of nucleotides. To decipher which sequences were more vulnerable, the group collected data from thousands of edits made to human immune cells using CRISPR and compiled them to train the SPROUT algorithm.
As the algorithm takes in data and "learns," it makes note of sequence patterns that seem to acquire more (or alternatively avoid) editing off-target mutations.
Once the algorithm was adequately trained, the group tested its ability to predict these unforeseen DNA edits, finding that it could assess them with high accuracy in human immune cells, among others. Not only could SPROUT predict the odds overall, it could also predict the overall magnitude of the error. For example, for a given sequence error it could also predict how many nucleotides would be involved and whether the error would damage the gene.
The ultimate goal, Zou said, is to help researchers and doctors devising experiments using CRISPR find the most precise way to edit a gene. A scientist looking to break up a harmful mutation in the DNA might have a handful of options, and Zou's algorithm could help the scientist decide where to cut. To help implement the vision, the team has created a website for SPROUT that's freely accessible.
"Gene editing is a fast-changing field and as scientists increasing look to CRISPR to aid in disease treatment it will be critical to make gene editing as accurate, and safe as possible," said Zou. "Our work demonstrates that machine learning algorithms can help us better understand the behavior of DNA repair and improve the precision and safety of gene editing."
Image by National Institutes of Health