🚀 Scaling Drug Discovery with High-Throughput Docking.
https://www.linkedin.com/feed/update/urn:li:ugcPost:7380434370551234560/
What kind of questions would you like to ask?
Over the past months at Dana-Farber, I’ve been working on large-scale docking pipelines that screen hundreds of thousands of compounds against protein pockets relevant to multiple myeloma and other cancers.
This work relies heavily on:
• HPC clusters with hundreds of nodes running in parallel
• Automation with bash, Python, and GNU Parallel
• Data wrangling across millions of output files
As the datasets grow, interesting questions emerge:
⏱️ How does docking time scale per compound with protein size and degrees of rotational freedom?
📊 How do docking box volumes (ų) correlate with energy distributions?
⚖️ What are the trade-offs between exhaustiveness, accuracy, and throughput?

I’ve been generating statistics and visualizations (histograms, distributions of binding energies, runtime analyses per protein/compound), and it’s exciting to see patterns begin to form.

👉 I’m curious: for those of you working in computational biology, cheminformatics, or HPC. What kinds of metrics or analyses do you find most valuable when scaling docking to this level?

