top of page
Search

THE RISE LIMIT OF AI IN THE WORLD OF CHEMISTRY

Nowadays, AI has reached in each and every corner of the world. In addition to this, AI has its contribution in the growing sector of chemistry, people have been investigating the role and usage of AI to speed up the scientific chemical studies, e.g.: AlphaFold


Traditional Chemistry Challenges- Time consumption, trial-and-error process, complexation in modelling molecular interaction and vast number of possible reaction combination. To overcome these challenges, AI shows potential application in field of chemistry, e.g. DENDRAL is a system successfully applied to identify molecular structure from mass spectroscopy. Current innovations, in processing power and growth of automated chemistry leads to heightened interest in growing Machine Learning method. Captivating technique in Machine Learning is Large Language Model which is gradually increasing attention for chemistry. Components of Machine Learning methodData, Representation and Model. Data enables identification of pattern and relationships. Representation catches important data and filter out pointless and corrupted data. Model typically reads data and is applied to make prediction and decision in new scenarios.



Over the years, there have been advancements in the technology and research which has shown improvement in chemical dataset which is providing great support for ML applications in the field of chemistry. The Chemical Dataset has 2 types- Molecular level dataset ( records structural attributes and chemical characteristics) and Reaction Level Dataset (focuses on process parameters and results). Molecular Level Dataset includes molecular structure, physicochemical properties, quantum mechanical properties and biological activity. It can be used to assess models’ capability to predict molecular properties. There are four types of Molecular Level Datasets- Several common public dataset, Quantum mechanics dataset, Molecular dynamic dataset and Molecular property dataset. In general, these datasets are constructed from sources like USPTO, Reaxys and HTE (High Throughput Experiment) Dataset like Suzuki coupling dataset.




However, to harness this data, we have to initially recognize chemical species and convert conceptual models of molecules into computerised form. This can be done by Molecular identifiers and Molecular fingerprints. E.g.1) Simplified Molecular Input Line Entry System. This system captures branching and ring-structure which encrypt stereochemistry and isotopes when necessary. 2) InChI (International Chemical Identifier) Heller et al., 2015. It encrypts structural information of molecule in layered format aspect like atomic conductivity, stereochemistry, isotopic composition and charge. InChI is still ongoing by IUPAC and International Chemical Identifier Trust.


Representation learning methods for chemical objects- The data should have spatial arrangements between atoms, 3-Dimensional conformation and electronic distribution which is necessary to convert data into representation. There are two foundational models for representation learning- Graph Neural Networks are planned for interconnected data through the process of message passing where nodes (atoms) repetitively exchange information with neighbours to refine their representations.

Common GNN architectures for molecules: GCN, GAT. 2D- GNNs handles connectivity and 3DGNNs capture spatial structure. In Transformer-based Representation learning, Transformers catch long range dependencies in sequence which are important for modelling chemical complex structure. In this, there are 2 types of training strategies- Pre-training and Fine-tuning makes transformers highly flexible for chemical tasks. Transformers may also run different experimental categories and forecast tasks in chemistry, like RXNFP (Reaction Fingerprints), it provides reaction level embeddings useful for classification and visualisation of chemical reaction.


Molecular Design in Chemistry- It’s goal is to design molecules with desired properties e.g. drugs that bind selectively to biological targets. Two main approaches of De Novo Molecular Design- 1) Deep Generative Methods 2) Combinatorial Optimization Methods. The aim of Deep Generative Method is to generate novel molecules with desired properties. There are 3 types of models- SMILES based generative models (easier to train but decoding errors can cause invalid molecules), Graphbased generative models (these work directly on molecular graphs) and Diffusion Model (Newest trend- popular in image generation which is now applied to molecules. It is powerful for 3D structure generation and fragment linking). The goal of Combinatorial Optimization method is to maximize desired properties while exploring diverse chemical space. This includes Genetic Algorithms (maintains population of molecules, good for exploration), Reinforcement learning (sequential, flexible and can enforce synthesizability) and Differential Optimization (reformulates molecular scaffold tree into a differentiable optimization problem and it is faster and more efficient).


Retrosynthesis is used to design synthetic route to obtain target molecules, it is critical for translating designed molecule into real-world production and is important for scalability. Retrosynthesis posses 2 main approaches: 1) Template based methods (Strength: chemically interpretable, Limitation: cannot easily discover novel/unconventional reaction) 2) Template free methods (Strength: more flexible and can generalize beyond predefined transformations, Limitation: less interpretable than template based and has higher risk of chemically invalid suggestions).



CHALLENGES FACED BY AI::

  • Data scarcity- There is limited size of dataset that causes increase in the risk of overfitting which ultimately results in poor generalization of performance.

  • Data bias- Negative data is created due to human biases that leads to variations in data accuracy and distribution which is overcome by reconstructing few unsuccessful attempts of a metal-organic coordination networks through a robotic synthesis procedure.

  • Interpretability- AI is progressing greatly in chemistry but has least attention towards result interpretation. For e.g. Current method of AI especially neural network are challenging to understand rationale behind their prediction and decision.


CONCLUSION:

It provides thorough introduction to AI in chemistry from a data-driven perspective. It gives a summary on the recent status of AI regarding chemistry based on 3 levels: Data, Representation and Machine Learning models for different uses, and is concluded by focusing on a few of the difficulties which are being faced today.

REF:

Journal reference: Frontiers of Computer Science (2025)


 
 
 

Recent Posts

See All
Beyond one size fits all

Imagine a scene where you and your best best-friend both go down with a bad headache and you both take the popular over the counter painkiller. But here's the twist within 30 minutes your friend is ba

 
 
 

Comments


bottom of page