Can AI Replace 45 Hours of Manual Pose Inspection? A Covalent Docking Comparison

Biomedical agents are different from general chatbots. They are expected to understand scientific questions, retrieve biomedical knowledge, plan workflows, call computational tools, analyze outputs, and provide safety recommendations. Because of this, evaluating only the final answer is not enough.

Jun 12, 2026

Background

Covalent inhibitors occupy a unique space in drug discovery. By forming irreversible interactions with target residues, they can achieve sustained target engagement and prolonged pharmacological activity beyond what is often possible with non-covalent ligands. However, optimizing covalent scaffolds presents a distinct challenge: favorable binding affinity alone is insufficient. Productive activity depends on precise geometric alignment between the reactive warhead and the target nucleophile.

Key principle: Covalent inhibitor binding is a two-step process: (1) non-covalent recognition positions the scaffold in the binding pocket, and (2) covalent bond formation requires the warhead to adopt reactive geometry relative to the target nucleophile. Both steps must succeed for productive covalent engagement. A ligand with excellent pocket fit but misaligned warhead is merely a non-covalent binder; a ligand with correct warhead geometry but poor pocket fit may never reach the residue to react.

Figure 1. Principle of covalent drugs: both non-covalent recognition (Step 1) and covalent bond formation (Step 2) must succeed. Adapted from MuseChem blog: The Rise of Covalent Drugs: How to Discover Potential Covalent Drugs.

This two-step requirement creates a practical bottleneck during computational lead optimization. Docking pipelines can generate multiple plausible poses for each analog, yet identifying the pose most compatible with covalent engagement often requires manual inspection by an experienced researcher. Critically, the two steps are evaluated by different criteria: non-covalent recognition is assessed by binding complementarity (pocket fit, hydrogen bonds, hydrophobic contacts), while covalent bond formation is assessed by geometric parameters (warhead-to-nucleophile distance, approach angle, steric accessibility). A pose selection strategy that only optimizes one step may systematically miss poses that satisfy the other.

This note describes a comparison designed to answer a practical question: can an AI agent reliably replace manual pose inspection in a covalent docking workflow, and how should such performance be measured when the two approaches are optimizing for different steps of the same process?

The System

Target: NF45 (Interleukin Enhancer-Binding Factor 2, ILF2), Chain A of PDB structure 4AT7.

Figure 2. Three-dimensional surface representation of the protein target. NF45 (Chain A) is colored teal and NF90 (Chain B) is colored orange.

NF45 forms a stable heterodimer with NF90, and together they constitute a key RNA-binding complex implicated in post-transcriptional gene regulation. The binding site of interest is a basic patch on NF45 (residues 104–127) identified by our wet lab as a functional RNA-binding interface. Disrupting this interface is the therapeutic hypothesis.

The epsilon-amine (NZ) of LYS110 is a strong nucleophile that undergoes Michael addition with α, β-unsaturated carbonyl warheads under physiological conditions.

bID0b.md.gif

The parent ligand is a sesquiterpene lactone containing an α-methylene-γ-butyrolactone warhead. In this system, the exo-methylene (=CH₂) serves as the electrophilic β-carbon of the Michael acceptor. Productive covalent engagement, therefore, requires the warhead to adopt a binding geometry that places the exo-methylene within reactive distance and orientation of LYS110 NZ. To explore structure-activity relationships while preserving the reactive pharmacophore, analogs were generated by fragment growth at two solvent-exposed hydroxyl positions (C1-OH and C8-OH) on the parent scaffold.

image-(3).png

Figure 4. Starting scaffold used in the workflow. DeepFrag-generated analogs were created by decorating the hydroxyl positions (green circles), while preserving the α-methylene-γ-lactone pharmacophore (red circle)

Workflow

The DeepFrag-generated analogs were first filtered using RDKit to ensure chemically valid structures and remove compounds failing basic quality checks. The remaining molecules were docked into NF45 using SigmaDock, with up to five poses retained per analog.

Pose selection represents the central challenge of this workflow. Conventional virtual screening pipelines typically prioritize the top-scoring docking pose; however, for covalent inhibitors, this assumption is often insufficient. A favorable docking score does not necessarily indicate that the α-methylene-γ-butyrolactone warhead is positioned appropriately for nucleophilic attack by LYS110. Conversely, a pose with correct warhead geometry may receive a lower docking score if the scaffold makes fewer conventional non-covalent contacts. This tension between Step 1 (non-covalent recognition) and Step 2 (covalent geometry) is the fundamental challenge that motivates comparing two different pose selection strategies.

Figure 5. AI-Assisted Covalent Docking Workflow

To address this, the Vecura AI Agent evaluates all generated poses rather than simply accepting the highest-scoring conformation. For each ligand, the agent assesses three quantitative criteria - the distance between LYS110 NZ and the =CH₂ warhead, the approach angle consistent with Michael addition geometry, and the presence of severe steric clashes - and selects the pose with the shortest NZ→=CH₂ distance, annotating each selection with a confidence tier reflecting how well it satisfies these criteria.

Agent-selected poses were subsequently compared against human-selected poses. Selected poses will be rescored using Uni-GBSA (EM mode, GB solvation) as a downstream validation step. It is important to note that MM-GBSA methods score non-covalent complementarity only (van der Waals, electrostatics, polar and nonpolar solvation); they do not evaluate covalent bond formation. Therefore, Uni-GBSA provides an independent assessment of Step 1, while the agent’s geometric criteria assess Step 2. These are complementary evaluations, not competing ones, and agreement between them strengthens confidence in a pose while disagreement flags cases requiring further scrutiny.

The Manual Approach

For each analog, all five SigmaDock poses were visually inspected using the Vecura molecular visualization interface. Poses were evaluated based on their compatibility with covalent engagement of LYS110, with particular emphasis on conformations in which the exo-methylene (=CH₂) of the α-methylene-γ-butyrolactone warhead was oriented toward the NZ atom of LYS110.

Pose selection was based on qualitative assessment of:

Orientation of the warhead relative to LYS110
Overall placement of the ligand within the binding groove
Absence of obvious steric clashes with surrounding residues

The manual approach implicitly evaluates both steps of covalent binding: the researcher assesses non-covalent pocket fit (Step 1) through visual inspection of binding complementarity, while simultaneously checking whether the warhead points toward LYS110 (Step 2). However, the primary weight in manual selection falls on Step 1 - whether the pose “looks right” in the pocket - with Step 2 serving as a qualitative filter rather than a quantitative constraint.

A key limitation of this approach is reproducibility. Manual pose selection relies on expert interpretation of three-dimensional binding geometries, and different researchers may reasonably prioritize different features - such as docking score, binding interactions, or warhead orientation. This variability is particularly apparent in borderline cases where multiple poses appear compatible with LYS110 engagement. In this study, we applied an implicit secondary heuristic: among poses passing the visual warhead check, the pose with the highest Vinardo score was preferred. This introduces a systematic difference from the agent's distance-minimizing strategy (See below).

Following pose selection, analogs were subject to a score threshold: only ligands whose selected pose achieved a Vinardo docking score below −5.5 kcal/mol - representing a meaningful improvement over the parent compound (score = −4.544 kcal/mol) - were forwarded for further evaluation. This threshold was chosen to filter out analogs whose predicted affinity gain falls within the expected noise of the Vinardo scoring function.

This process required inspection of 1,360 docking poses (272 analogs × 5 poses). Assuming approximately 2 minutes per pose, the workflow represents roughly 2,720 minutes (~45 hours) of manual review.

The Agent Approach

The Vecura AI Agent was provided with the same SigmaDock output poses and tasked with selecting the pose most geometrically consistent with productive covalent engagement of LYS110. Rather than relying on the docking score alone, the agent evaluated all generated poses using three predefined quantitative criteria.

Criterion 1 - Warhead Distance: Distance between LYS110 NZ and the exo-methylene carbon (=CH₂) of the α-methylene-γ-butyrolactone warhead ≤ 5 Å. Distances ≤ 3.5 Å were considered high confidence; 3.5–5 Å were considered acceptable.
Criterion 2 - Approach Geometry: Angle between the NZ→=CH₂ vector and the exo-methylene C=C bond axis between 80–130°, consistent with productive Michael addition geometry.
Criterion 3 - Steric Clash Filter: Poses containing severe steric clashes — defined as atom-atom overlap exceeding 0.5 Å (0.75 × summed van der Waals radii threshold) - were excluded regardless of favorable warhead positioning.

For each ligand, the agent selected the conformation with the shortest NZ→=CH₂ distance among all generated poses and annotated whether the selected pose satisfied the above geometric and steric criteria. The agent always returns a selection for every ligand, annotating whether the selected pose satisfies all geometric criteria, and applies the same −5.5 kcal/mol Vinardo threshold as the manual workflow before forwarding for evaluation.

The agent operates in batch mode, processing each ligand in under 1 second. Unlike manual inspection, the workflow is fully deterministic: identical inputs always produce identical outputs.

Comparison Design

What Is Being Compared

Table 1. Pose Selection Strategy Comparison


Method	Primary evaluation	Secondary evaluation	Tiebreaker
Human (manual)	Step 1: Non-covalent pocket fit (visual)	Step 2: Warhead orientation (qualitative)	Highest Vinardo score among plausible poses
Agent (geometric)	Step 2: Covalent geometry (quantitative)	Step 1: Docking score (proxy)	Shortest NZ→=CH₂ distance among passing poses

The two methods are not competing assessments of the same property. They are orthogonal filters that emphasize different steps of covalent binding. Agreement between them indicates a pose that satisfies both non-covalent recognition and covalent geometry. Disagreement indicates a tension between the two steps - a pose that fits the pocket well but has suboptimal warhead geometry, or vice versa.

Scoring Method

For each ligand, the human and agent selections are compared as follows:

Primary metric:

Pose agreement rate (%) = (Full agreement / N ligands) × 100

Secondary metrics:

NZ→=CH₂ distance: human-selected vs agent-selected (mean ± SD, Å)
Approach angle: human-selected vs agent-selected (mean ± SD, °)

Results

Strategy Comparison

Table 2. Strategy Agreement Across 268 Ligands


Metric	Value
Total ligands evaluated	268
Full agreement	147 (61.76%)
Disagreement	121 (38.24%)

Of the 121 disagreements, 100 (37.31% of the total) represent cases where both the human and agent selected a pose passing the geometric filter but differed on the pose index. These are Step 1 vs. Step 2 tiebreaker conflicts: both methods agreed that covalent geometry was achievable, but the human preferred the pose with better pocket fit (higher Vinardo score) while the agent preferred the pose with shorter warhead distance. Only 21 cases (7.84%) represent substantive disagreements where the pass/fail decisions differed between the two approaches - cases where one method identified a pose compatible for advancement and the other did not.

Geometric Comparison

Table 3. Geometric Properties of Human- vs. Agent-Selected Poses


Metric	Human(n = 268)	Agent(n = 268)
NZ→CH₂ distance (Å)	3.23 ± 1.57	2.97 ± 1.30
Approach angle (°)	86.25 ± 29.10	91.19 ± 26.68
Distance pass rate	95.5%	96.6%
Angle pass rate	50.4%	58.6%
Vinardo score (kcal/mol)	−4.48 ± 1.04	−4.05 ± 1.28

Agent-selected poses show shorter mean NZ→=CH₂ distances (2.97 Å vs. 3.23 Å), a direct consequence of the agent's distance-minimizing tiebreaker. Both mean approach angles fall within the productive range for Michael addition (80-130°), with the agent's mean (91.19°) closer to the center of the range than the human's (86.25°). The ~5° angle difference is small relative to the standard deviations (~27-29°), indicating substantial overlap between the two distributions.

Despite using different selection strategies, the geometric properties of the selected poses do not differ substantially on average. Paired comparisons yield small effect sizes for both distance (Cohen's d = 0.29) and angle deviation from the productive range (Cohen's d = 0.29). The 0.26 Å distance difference falls within typical docking coordinate uncertainty, and both mean angles are within the 50°-wide productive range.

However, the two strategies diverge at the tails. The agent produces systematically more geometrically viable poses: 58.6% of agent-selected poses pass the angle criterion (80–130°) vs. 50.4% of human-selected poses (McNemar p < 0.001), and 96.6% pass the distance criterion (≤5 Å) vs. 95.5% (McNemar p = 0.002). The more consequential difference is in docking score: the agent's Step 2 optimization comes at a cost to Step 1 complementarity, with a score difference of 0.43 kcal/mol overall (Cohen's d = 0.44) and 0.91 kcal/mol among the 121 cases where the methods selected different poses..

image-(4).png

Figure 6. Distribution of NZ→=CH₂ distances for human-selected and agent-selected poses.

image-(5).png

Figure 7. Distribution of approach angles for human-selected and agent-selected poses.

Ligand Advancement Overlap

Table 4. Ligand Advancement Overlap After Vinardo Threshold


Category	Count	Interpretation
Same ligand, same pose	22	Both steps satisfied by both methods
Same ligand, different pose	3	Step 1 vs. Step 2 tension — same ligand, different binding mode
Human-only ligands	15	Agent’s pose scored above −5.5 threshold; human’s pose scored below
Agent-only ligands	0	Agent selected no ligands outside the human’s set

After applying the −5.5 kcal/mol Vinardo threshold, the human advanced 40 ligands for further evaluation while the agent advanced 25. All 25 agent-selected ligands were within the human’s 40, indicating that the agent was more conservative in ligand selection rather than selecting a different set. Among the 25 overlapping ligands, 22 had the same pose selected by both methods, while 3 had different poses..

Discussion

The Geometric Comparison Reflects the Docking Engine, Not the Selection Strategy

The small differences in mean distance (0.26 Å) and mean angle (~5°) between human-selected and agent-selected poses are not meaningful findings about the selection strategies themselves. Both methods select from the same pose pool - the SigmaDock output. The geometric properties of any selected pose are constrained by what the docking engine generates. If SigmaDock produces five poses for a ligand that all have similar warhead-to-LYS110 geometry, no selection strategy can produce a substantially different geometric outcome.

The distance difference (0.26 Å) is within typical docking coordinate uncertainty (~0.3–0.5 Å), and the angle difference (~5°) is well within the 50°-wide productive range (80–130°). Estimated effect sizes are small. With n = 268, paired statistical tests reach nominal significance, but the practical difference is negligible: both methods select poses with comparable warhead geometry because the pose pool itself does not offer substantially different geometric alternatives for most ligands.

The Agent Is Conservative Because Step 2 Optimization Can Cost Step 1

The agent advanced 25 ligands for further evaluation, all of which were within the human’s 40. The agent did not select any ligands the human excluded. This pattern does not mean the agent failed to find poses for the 15 excluded ligands - it always returns a selection. Rather, the agent’s poses for those 15 ligands received Vinardo scores above the −5.5 kcal/mol threshold and were filtered out. The human selected different poses for the same ligands that scored below −5.5 and were advanced.

This is a direct consequence of the agent’s Step 2 optimization. By selecting the pose with the shortest NZ→=CH₂ distance, the agent sometimes chooses a conformation where the warhead is well-positioned for covalent bond formation but the scaffold makes fewer favorable non-covalent contacts with the pocket. The resulting docking score reflects this poorer Step 1 performance. The human, optimizing primarily for Step 1, selects poses with better pocket complementarity that achieve higher (more negative) Vinardo scores.

Two interpretations of the 15 human-only ligands are possible:

The human’s poses satisfy both steps: the warhead is still within productive covalent geometry (just not at the shortest distance), and the pocket fit is good. These are genuine covalent inhibitor candidates that the agent’s strict distance-minimizing strategy missed.
The human’s poses sacrifice Step 2 for Step 1: the pocket fit is good but the warhead is too far from LYS110 or at a poor approach angle for covalent bond formation. These may be non-covalent binders that score well on docking but cannot form the covalent bond.

Resolving this ambiguity requires checking whether the human-selected poses for these 15 ligands satisfy the agent’s geometric criteria (distance ≤5 Å, angle 80-130°). If they do, interpretation (1) applies and the agent’s distance-minimizing tiebreaker is too aggressive. If they do not, interpretation (2) applies and the human’s visual assessment accepts poses with inadequate covalent geometry.

Practical Time Saving

The manual workflow required ~45 hours of expert review for 1,360 poses. The agent processes the same input in under 5 minutes. Even if the agent’s selections require manual verification for the subset of cases with substantive disagreements, the net time saving is substantial. The more relevant question is whether the agent’s geometric criteria are well-calibrated for this specific target and warhead chemistry, which determines whether the time saving comes at the cost of missing productive covalent binders.

Conclusion

This comparison reveals that manual and agent-based pose selection for covalent inhibitors are not competing methods but complementary filters operating on different steps of the covalent binding process. The human primarily optimizes for non-covalent pocket complementarity (Step 1) with qualitative warhead awareness, while the agent primarily optimizes for covalent bond formation geometry (Step 2) with docking score as a proxy for pocket fit.

For practical workflow design, we recommend a combined approach:

Apply the agent’s geometric criteria as the primary filter for Step 2. If no pose passes, the ligand is unlikely to be a productive covalent inhibitor regardless of pocket fit.
Among poses passing covalent geometry, use visual assessment or downstream rescoring to rank by non-covalent complementarity (Step 1). This identifies poses where the scaffold is well-positioned to deliver the warhead to the residue.
For the small number of cases where Step 1 and Step 2 are in tension (same ligand, different optimal poses for each step), short MD simulations can test dynamic stability and resolve the conflict with physics-based evidence.

This combined workflow preserves the agent’s speed and geometric rigor while incorporating human chemical intuition and physics-based validation where they add the most value. The net result is a more efficient and more reliable covalent pose selection pipeline than either method alone.

Reference

[1] García-Jacas, C. R., Green, H., Wierbowski, S. D., & Durrant, J. D. (2026). Precision fragment addition: domain-specific DeepFrag2 models for smarter lead optimization. Digital Discovery, 5(3), 1340–1350. https://doi.org/10.1039/d5dd00425j

[2] Yang, M., Bo, Z., Xu, T., Xu, B., Wang, D., & Zheng, H. (2023). Uni-GBSA: an open-source and web-based automatic workflow to perform MM/GB(PB)SA calculations for virtual screening. Briefings in Bioinformatics, 24(4). https://doi.org/10.1093/bib/bbad218

[3] Singh, J., Petter, R. C., Baillie, T. A., & Whitty, A. (2011). The resurgence of covalent drugs. Nature Reviews Drug Discovery, 10(4), 307–317. https://doi.org/10.1038/nrd3410

[4] Hillebrand, L., Liang, X. J., Serafim, R. A. M., & Gehringer, M. (2024). Emerging and Re-emerging Warheads for Targeted Covalent Inhibitors: An Update. Journal of Medicinal Chemistry, 67(10), 7668–7758. https://doi.org/10.1021/acs.jmedchem.3c01825

[5] Lagoutte, R., Patouret, R., & Winssinger, N. (2017). Covalent inhibitors: an opportunity for rational target selectivity. Current Opinion in Chemical Biology, 39, 54–63. https://doi.org/10.1016/j.cbpa.2017.05.008

立即试用 Vecura。

带上您自己的输入，开始探索 Vecura 的能力。

立即试用 Vecura

Can AI Replace 45 Hours of Manual Pose Inspection? A Covalent Docking Comparison

Jun 12, 2026

Method

Primary evaluation

Secondary evaluation

Tiebreaker

Human (manual)

Step 1: Non-covalent pocket fit (visual)

Step 2: Warhead orientation (qualitative)

Highest Vinardo score among plausible poses

Agent (geometric)

Step 2: Covalent geometry (quantitative)

Step 1: Docking score (proxy)

Shortest NZ→=CH₂ distance among passing poses

Metric

Value

Total ligands evaluated

268

Full agreement

147 (61.76%)

Disagreement

121 (38.24%)

Metric

Human(n = 268)

Agent(n = 268)

NZ→CH₂ distance (Å)

3.23 ± 1.57

2.97 ± 1.30

Approach angle (°)

86.25 ± 29.10

91.19 ± 26.68

Distance pass rate

95.5%

96.6%

Angle pass rate

50.4%

58.6%

Vinardo score (kcal/mol)

−4.48 ± 1.04

−4.05 ± 1.28

Category

Count

Interpretation

Same ligand, same pose

Both steps satisfied by both methods

Same ligand, different pose

Step 1 vs. Step 2 tension — same ligand, different binding mode

Human-only ligands

Agent’s pose scored above −5.5 threshold; human’s pose scored below

Agent-only ligands

Agent selected no ligands outside the human’s set

Background

The System

Workflow

The Manual Approach

The Agent Approach

Comparison Design

What Is Being Compared

Scoring Method

Results

Strategy Comparison

Geometric Comparison

Ligand Advancement Overlap

Discussion

The Geometric Comparison Reflects the Docking Engine, Not the Selection Strategy

The Agent Is Conservative Because Step 2 Optimization Can Cost Step 1

Practical Time Saving

Conclusion

Reference

相关新闻

Try 280+ AI Tools for Life Science Research for Free Now on Vecura

Can We Trust Biomedical AI Agents? Benchmarking Quality, Safety, and Reliability

Staying ahead in Southeast Asia: NTU alumni on AI, biotech, and what it takes to compete

Background

The System

Workflow

The Manual Approach

The Agent Approach

Comparison Design

What Is Being Compared

Scoring Method

Results

Strategy Comparison

Geometric Comparison

Ligand Advancement Overlap

Discussion

The Geometric Comparison Reflects the Docking Engine, Not the Selection Strategy

The Agent Is Conservative Because Step 2 Optimization Can Cost Step 1

Practical Time Saving

Conclusion

Reference

相关新闻

Try 280+ AI Tools for Life Science Research for Free Now on Vecura

Can We Trust Biomedical AI Agents? Benchmarking Quality, Safety, and Reliability

Staying ahead in Southeast Asia: NTU alumni on AI, biotech, and what it takes to compete