التفاصيل البيبلوغرافية
العنوان: |
GexMolGen: cross-modal generation of hit-like molecules via large language model encoding of gene expression signatures. |
المؤلفون: |
Cheng, Jiabei1 (AUTHOR), Pan, Xiaoyong1 (AUTHOR), Fang, Yi1 (AUTHOR), Yang, Kaiyuan1 (AUTHOR), Xue, Yiming1 (AUTHOR), Yan, Qingran2 (AUTHOR) yanqingran@renji.com, Yuan, Ye1,3,4 (AUTHOR) yanqingran@renji.com |
المصدر: |
Briefings in Bioinformatics. Nov2024, Vol. 25 Issue 6, p1-14. 14p. |
مصطلحات موضوعية: |
LANGUAGE models, DRUG discovery, BIOMOLECULES, MOLECULAR graphs, GENE expression |
مستخلص: |
Designing de novo molecules with specific biological activity is an essential task since it holds the potential to bypass the exploration of target genes, which is an initial step in the modern drug discovery paradigm. However, traditional methods mainly screen molecules by comparing the desired molecular effects within the documented experimental results. The data set limits this process, and it is hard to conduct direct cross-modal comparisons. Therefore, we propose a solution based on cross-modal generation called GexMolGen (Ge ne Ex pression-based Mol ecule Gen erator), which generates hit-like molecules using gene expression signatures alone. These signatures are calculated by inputting control and desired gene expression states. Our model GexMolGen adopts a "first-align-then-generate" strategy, aligning the gene expression signatures and molecules within a mapping space, ensuring a smooth cross-modal transition. The transformed molecular embeddings are then decoded into molecular graphs. In addition, we employ an advanced single-cell large language model for input flexibility and pre-train a scaffold-based molecular model to ensure that all generated molecules are 100% valid. Empirical results show that our model can produce molecules highly similar to known references, whether feeding in- or out-of-domain transcriptome data. Furthermore, it can also serve as a reliable tool for cross-modal screening. [ABSTRACT FROM AUTHOR] |
|
Copyright of Briefings in Bioinformatics is the property of Oxford University Press / USA and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) |
قاعدة البيانات: |
Business Source Index |