Mobile

Top 5 Free Android Apps For You

Gadget

Get More with Three UK: Unlimited Data, Roaming, and More

Digital

How to Safeguard Your Tech Life from Online Threats

John Nikova

John is a blog writer and expert on modern digital processes. He has been researching the field for over 10 years. He seeks to increase public understanding of digital potential and opportunities.

Digit-haus
  • Home
  • About Us
  • Contact Us
  • Privacy Policy
Read More
Digit-haus
SUBSCRIBE
  • English
    • Nederlands
    • Français
    • Italiano
    • Português
    • Espanol
    • Deutsch
    • Suomi
    • Polski
    • Dansk
    • Norsk Bokmål
    • Svenska
    • ไทย
Technology

MIT’s AI Learns Molecular Language for Rapid Material Development and Drug Discovery

root9871
August 13, 2023 4 Mins Read
72 Views
0 Comments

The new AI system developed at the MIT-Watson AI Lab reliably predicts chemical features with minimum data, vastly simplifying the medicine and material discovery processes. To quickly and effectively produce new molecules, the system employs a “molecular grammar” it has learnt through reinforcement learning. Using datasets with fewer than 100 samples, this strategy has proven to be remarkably effective.

This AI system only needs a small amount of data to predict molecular properties, which could speed up drug discovery and material development.

Manual, trial-and-error processes for discovering new materials and pharmaceuticals can take decades and cost tens of millions of dollars. Machine learning is frequently used by scientists to predict chemical properties and reduce the number of compounds that must be synthesized and tested in the lab.

Molecular property prediction and molecule generation have both been greatly improved by a new, unified framework established by MIT and MIT-Watson AI Lab researchers.

Researchers need to train a machine-learning algorithm on millions of labeled chemical structures before it can accurately predict a molecule’s biological or mechanical properties. The efficiency of machine-learning systems is hindered by the difficulty of obtaining large training datasets due to the high cost of discovering compounds and the difficulties of hand-labeling millions of structures.

In contrast, the method developed by the MIT team accurately predicts molecular attributes with minimal input. Their system is predicated on a knowledge of the laws governing the correct combination of building components to construct molecules. These criteria assist the system to efficiently build new compounds and forecast their attributes by capturing the similarities between chemical structures.

When given a dataset with fewer than 100 samples, our method was able to reliably predict chemical characteristics and build viable compounds, outperforming existing machine-learning approaches on both small and big datasets.


Using machine learning and a little quantity of training data, MIT and MIT-Watson AI Lab researchers have created a unified framework for predicting chemical features and creating novel compounds. Image courtesy of Jose-Luis Olivares/MIT.

Graduate student in computer science and electrical engineering (EECS) Minghao Guo explains, “Our goal with this project is to use some data-driven methods to speed up the discovery of new molecules, so you can train a model to do the prediction without all of these cost-heavy experiments.”

Researchers Veronika Thost, Payel Das, and Jie Chen from the MIT-IBM Watson AI Lab, along with recent MIT grads Samuel Song ’23 and Adithya Balachandran ’23, and senior author and EECS professor and MIT-IBM Watson AI Lab member Wojciech Matusik, head of the CSAIL’s Computational Design and Fabrication Group, contributed to the paper with Guo. The findings will be shared at the Machine Learning Conference.

Learning the language of molecules

Scientists need training datasets containing millions of molecules with similar properties to those they hope to discover in order to get the best results from machine-learning models. In practice, datasets from a single domain tend to be quite small. Models are then applied to a much smaller, more specific dataset, yet these models have already been pretrained on vast datasets of broad molecules. However, these models typically perform badly since they haven’t learned much domain-specific information.

The MIT group decided to take a different tack. Using just a tiny, domain-specific dataset, they developed a machine-learning system that can automatically learn a molecular grammar, or the “language” of molecules. With this “grammar,” it can build functional molecules and make educated guesses about their properties.

Words, phrases, and even entire paragraphs can be created using grammar rules in the study of language. A molecular grammar is similar in concept. It’s a set of guidelines for making molecules and polymers out of smaller building blocks.

One molecular grammar can represent an extremely large number of molecules, much like a language grammar can generate a large number of sentences using the same principles. The system is trained to recognize commonalities in the production rules followed by groups of molecules that share structural similarities.

The system exploits its innate understanding of molecular similarity to better forecast the features of novel molecules, as it has shown that molecules with similar structures tend to share similar properties.

To improve property prediction, “once we have this grammar as a representation for all the different molecules,” Guo says.

Using trial-and-error methods where the model is rewarded for behavior that brings it closer to accomplishing a goal, the system learns the production rules for a molecular grammar.

However, the procedure to learn grammar production rules would be prohibitively computationally expensive for anything but the smallest dataset because there may be billions of ways to mix atoms and substructures.

The researchers separated the two components of the molecular grammar. First, they hand the system a metagrammar, which is a manual, broadly applicable grammar they create. Then, the domain dataset can be used to teach it a much more condensed, molecule-specific language. The learning process is accelerated by this hierarchical structure.

Big results, small datasets

Despite using domain-specific datasets with only a few hundred samples, the novel method developed by the researchers was able to synthesize live molecules and polymers concurrently, and accurately predicted their properties, in testing. The new methodology eliminates the need for expensive pretraining, which was necessary for several alternative methods.

The method excelled at forecasting the glass transition temperature of polymers, the temperature at which a substance changes phase from solid to liquid. Due to the high temperatures and pressures required for the tests, obtaining this data manually is usually prohibitively expensive.

The researchers halved the size of one training set, leaving only 94 samples, to test the limits of their method. Results from their model were still competitive with those of approaches trained with the full dataset.

Share Article

Follow Me Written By

root9871

Other Articles

Previous

An Ingenious High-Power Thermoelectric Device Set to Disrupt the Electronics Cooling Industry

Next

Energy from Falling Raindrops Is Captured Efficiently by New Triboelectric Nanogenerator Technology

Next
August 13, 2023

Energy from Falling Raindrops Is Captured Efficiently by New Triboelectric Nanogenerator Technology

Previews
August 13, 2023

An Ingenious High-Power Thermoelectric Device Set to Disrupt the Electronics Cooling Industry

Digit-haus

Digit Haus is inspired to be part of the digital movement in every sense of the term. Constantly on the wave of technologies.
Contact us

[email protected]

© 2022, All Rights Reserved.

Categories

Top Picks 
Laptops and computers 
Smartphones and mobile devices
Smarthome Technology 
Gaming consoles and accessories

Recent Posts

Score Big with MLB Ticketing Plans: Unlock the Best Game Day Experience
Show Your Yankees Pride: A Complete Guide to Merchandise and Gear for True Fans

Useful Links

  • Hjem
  • Om os
  • Kontakt os
  • Fortrolighedspolitik
IMPRESSUM
Netcraft Digital Ltd
275 New North Road, Suite 1459 London N1 7AA United Kingdom
Contact: [email protected]
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.OkCookies policy