SAMURAI - NIMS Researchers Database

HOME > プロフィール > ラムバール ギヨム

研究内容

所属学会

日本MRS

マテリアル基盤研究センター
タイトル

SMILES-X: autonomous molecular compounds characterization for small datasets without descriptors

キーワード

Cheminformatics, Small molecules, SMILES, Natural language processing, Machine learning, Attention mechanism, Small datasets

概要

Key Takeaways:
• The SMILES-X is an autonomous pipeline that uses machine learning to predict physicochemical properties of molecular compounds, overcoming the challenges of small datasets and the need for task-specific descriptors.
• The SMILES-X achieves state-of-the-art results in predicting aqueous solubility, hydration free energy, and octanol/water distribution coefficient of molecular compounds.
• The SMILES-X is a valuable tool for materials scientists and chemists, providing interpretable predictions and improving the accuracy of physicochemical property inference.

新規性・独創性

• The SMILES-X is an autonomous pipeline for molecular compounds characterization
• The SMILES-X is based on a neural architecture with a data-specific Bayesian hyper-parameters optimization
• The attention mechanism in the SMILES-X enables the interpretation of output predictions
• The SMILES-X shows state-of-the-art results in the inference of aqueous solubility, hydration-free energy, and octanol/water distribution coefficient of molecular compounds
• The source code for the SMILES-X is available at https://github.com/Lambard-ML-Team/SMILES-X

内容

imageimageimage

• The SMILES-X is a novel approach that tackles both the issue of small datasets and the difficulty of developing task-specific descriptors, making it a valuable asset in the toolkit of materials scientists and chemists.
• The SMILES-X can be used in various applications such as drug discovery, material design, and chemical synthesis. It can help researchers predict the physicochemical properties of molecular compounds accurately and efficiently,
which can save time and resources.
• The marketability of the SMILES-X depends on its ability to accurately predict the physicochemical properties of molecular compounds and its ease of use. If the method proves accurate and efficient, it could be a valuable tool for
researchers in materials science and related fields.
• The potential exit for the SMILES-X could be through licensing the technology to companies in materials science and related fields or through the development of a software tool that researchers can use to predict the physicochemical properties of molecular compounds.

F1: The SMILES-X pipeline
F2: Fixed skeleton of the neural architecture in the SMILES-X
F3: Visualisation of the importance of each token within the SMILES towards the final prediction of the property of interest. The illustration is done on the structure Cc1ccc(O)cc1C from the FreeSolv [31] dataset, with hydration-free energy as
the corresponding property. The 1D (a) and 2D (b) attention maps show the projections of the attention vector α on the SMILES string and molecular graph, respectively. The redder and darker the colour is, the stronger is the attention on a given token. The temporal relative distance is shown in (c). The closer to zero is the distance value, the closer is the temporary prediction on the SMILES fragment to the whole SMILES prediction.

まとめ

• The SMILES-X can be applied in various areas such as drug discovery, material design, and chemical synthesis to predict the physicochemical properties of molecular compounds accurately and efficiently.
• The SMILES-X can be integrated with other materials informatics methods to create a more comprehensive toolkit for researchers in materials science and related fields.

この機能は所内限定です。
この機能は所内限定です。

▲ページトップへ移動