High-Throughput Chemical Screening with FPSim2 is Now Live on Vecura
This update enables medicinal chemists and computational researchers to perform high-throughput molecular similarity and substructure screening through a guided workflow inside Vecura, without setting up complex technical infrastructure.
What is FPSim2?
FPSim2 is a high-performance Python/C++ library designed for rapid similarity and substructure screening of large-scale chemical compound databases. Built on top of RDKit, it utilizes a NumPy-centric architecture and efficient sublinear search algorithms to identify molecules similar to a query compound based on binary fingerprints. It is a production-hardened tool, utilized by major entities like ChEMBL and SureChEMBL, to handle massive chemical datasets with near-native CPU speeds.
It helps users perform high-throughput virtual screening, hit expansion, and structural analogue identification. It is especially useful for medicinal chemists and computational researchers who need to quickly query millions of compounds for specific chemical features or structural similarities.
What can users do with FPSim2 on Vecura?
With FPSim2 on Vecura, users can:
- Execute Similarity Searches: Efficiently identify molecules with high Tanimoto, Dice, or Cosine similarity to a specific query.
- Perform Asymmetric Tversky Searches: Search for structural motifs using asymmetric Tversky similarity for more nuanced chemical space exploration.
- Run Substructure Screenouts: Rapidly filter databases to find potential substructure matches based on bit-level fingerprint patterns.
- Generate Top-K Rankings: Quickly retrieve the most relevant candidates from vast compound collections based on predefined rank parameters.
What the output means
The output provides a list of mol_id entries associated with their respective similarity coefficients. This ranked list allows users to prioritize compounds for downstream experimental validation or further computational analysis.
This output should be used to support scientific decision making. It does not replace experimental validation.
Why this matters
In drug discovery, the ability to rapidly scan chemical libraries containing millions of entries is a bottleneck for identifying novel leads. Traditional brute-force searching can be computationally prohibitive, delaying the hit-to-lead process. By leveraging the Swamidass & Baldi sublinear bound, FPSim2 drastically reduces the computational overhead, allowing researchers to explore chemical space at scale.
This integration into Vecura simplifies this complex process, providing a streamlined workflow that removes the need for researchers to manage high-performance computing infrastructure or manual HDF5 database compilation.
- Developed by: ChEMBL Group
- Source: Official GitHub Repository
- Reference: Documentation
在 Vecura 上试用 FPSim2
打开模型工作区,用您自己的输入开始评估