This artificial intelligence framework integrates AI techniques with strucutural biology, biochemistry, Newtonian mechanics, and quantum chemistry principles to identify mutations in Janus Kinase 2 (JAK2) and other proteins. Protein misfolding and dysfunction are fundamental processes underlying numerous human diseases. Particularly, abnormal protein function, such as constitutive activation of kinases, can lead to dysregulated signaling pathways and potentially cause uncontrolled cell proliferation and survival. The Janus kinase (JAK1-3 and TYK 2) protein family is an example where single point mutations or multiple synergetic mutations can lead to dysfunction directly implicated in diseases, such as blood cancers.
Protein engineering is a powerful approach with demonstrated impact in medicine, molecular biology, and translational sciences. However, the conventional approach to protein engineering requires lengthy laboratory experimentation to discover critical mutation targets. In silico mutation screening methods offer a faster alternative for assessing the impact of mutations but the learning models rely on small, labeled datasets. Additionally, these methods exhibit limited accuracy. This slows drug development and limits the ability of pharmaceutical companies to design more precise therapies.
Researchers at the University of Florida have developed an artificial intelligence framework for identifying how JAK2 mutations affect protein stability and drug binding. The system integrates both genetic and structural information to uncover relationships that traditional models often miss. By predicting how specific mutations influence the success of JAK2 targeted therapies, this framework can shorten the drug discovery process, reduce development costs, and accelerate the creation of more effective treatments for patients.
AI framework for drug discovery to systemically characterize the strucutural impact of the JAK2 protein
This artificial intelligence framework leverages a large dataset of detailed genetic sequences with their associated protein structure. This dataset is cleaned, self-labeled and utilized to train both a transformer and a convolutional neural network (CNN). The transformer captures dependences and contextual information in genetic sequences, while the CNN detects patterns in the 3D protein structure data. Then, the model runs a mechanism to understand the interactions and dependencies between distant proteins. Lastly, the model is fine-tuned for stabilization and destabilization of the structure. The output is a list of mutations and the probabilities that they are stabilizers or destabilizers.
[%Analytics%]