Published by the Students of Johns Hopkins since 1896
January 20, 2025

Tommi Jaakkola presents on the power of generative AI in molecular sciences

By MIHIR RELAN | October 25, 2024

img-1382

COURTESY OF MIHIR RELAN

Jaakkola discusses the applications of generative AI to molecular biology.

The Department of Computer Science hosted Tommi Jaakkola, a professor in the Department of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology and the Institute for Data, Systems and Society, on Tuesday, Oct. 15. In his talk, titled “Generative AI for (Molecular) Sciences,” Jaakkola highlighted the advancements in generative artificial intelligence (AI) for molecular science and material design. 

Jaakkola began by discussing how the growing use of AI and machine learning models to reframe scientific challenges offer new approaches to traditional problems. He explained that generative models are a type of AI that can create new things based on patterns they’ve learned from existing data. For example, these models can predict how molecules will interact, how proteins will fold or even design new materials (see AlphaFold by Google DeepMind). These models make it easier to come up with complex structures that would normally be very difficult and time-consuming to figure out using traditional computational methods.

“Everybody is talking about generative AI in some sense,” Jaakkola said. “Our work cuts across molecular sciences in terms of this generative aspect, starting from small molecules, optimizing small molecules and generating small molecules.”

He continued by discussing the role of generative AI in molecular docking, a process in drug design where small molecules bind to protein targets. In traditional experiments, docking experiments relied on methods that are computationally expensive and require testing millions of possible configurations to find the best fit. Jaakkola explained that generative AI can streamline this process by predicting the most likely binding conformations for these molecules, greatly reducing the time and resources needed for docking experiments.

“You can certainly try millions of different possible poses,” he said. “You can try them out and cleverly search [for the best pose]. But that’s extremely slow because of the degrees of freedom. We can solve this problem using machine learning methods to accelerate the search.”

Jaakkola then introduced another area where generative AI can be applied: protein structure design. Proteins, comprised of long chains of amino acids, are flexible and can fold into various shapes, each serving a specific function in biological systems. He explained how diffusion models, a type of generative AI, work by adding random distortions, or "noise," to a protein’s structure. The model then learns to reverse this process by gradually removing the noise, allowing it to simulate how proteins fold into stable shapes.

“You start adding noise to the compound, and it wanders away, and, ultimately, you are in a noisy initial configuration here,” Jaakkola said. “But since you know how much noise you added every step of the way, so you can learn to reverse this process. You can learn these de-noising steps.”

Jaakkola then pivoted to a discussion of flow-based models, another type of generative AI that transforms molecules from a noisy, distorted state to a clean, fully formed structure. Compared to diffusion models that add and remove noise, flow-based models follow a direct trajectory from noisy to clean. These models help researchers simulate the exact 3D conformations of proteins, enabling them to see how small molecules might bind to specific sites on these proteins. This approach helps streamline the drug development process by offering more accurate predictions, as there is an targeted direction to move towards when de-noising a protein. This also enables researchers to design new proteins with desired functionalities like catalyzing a chemical reaction or targeting specific cells in the body.

“If you take a clean structure at one end and you look at its complete noisy sample at the other end, you can take a linear trajectory from a noisy [structure] to a clean one,” he added.

One of the major challenges Jaakkola highlighted when using generative AI for molecular sciences was the lack of large datasets to train machine learning models on. The models themselves are extremely powerful, but they require vast amounts of data to learn enough for clinical applications. However, there are several different ideas that researchers today are experimenting with.

"You can pull up other protein data complexes from other sources, get maybe twice the size of their dataset now and you increase model capacity,” he added. “If you scale the size of the model and you scale the amount of data you feed it with, you get a tremendous boost in accuracy.”

Jaakkola concluded his talk by emphasizing that the rapid advancements in generative AI hold great promise for revolutionizing molecular sciences and drug design. As datasets grow and computational methods become more efficient, the potential applications of AI to creating new materials, improving drug discovery processes and understanding biological structures will continue to expand. 


Have a tip or story idea?
Let us know!