Tarball contents: REAME - this document Directories for each of the 40 DUD Targets containing: ligands.charge - gives unique protonation states of input ligands Format: one ligand protonation form per line SMILES input_id protonation_id mwt logp rb hba hbd charge decoys/ decoys..picked - contains matched decoys for each unique ligand protonation state Format: ligand protomer and then 50 matched decoys first line: ligand SMILES input_id protonation_id SMILES ZINC_ID ZINC_Protonation_ID Automated Decoy Generation Method. As in the original DUD, we property-matched decoys to ligands using molecular weight, estimated water-octanol partition coefficient (miLogP), rotatable bonds, hydrogen bond acceptors, and hydrogen bond donors, plus we added net charge. We generated all ligand protonation states in pH range 6-8 using Schrödinger’s Epik with arguments “-ph 7.0 -pht 1.0 -tp 0.20”. Molecular properties were computed using Molinspiration’s mib. For each unique set of 6 properties, we aimed to generate 50 matched decoys. For example, a single input ligand predicted to have 2 alternate charges would get 50 decoys property-matched to each charge. Next a pool of decoys was selected from ZINC45 using a dynamic protocol that adapted to local chemical space by narrowing or widening windows in seven steps around the 6 properties. The goal was to return 3000 to 9000 potential decoys that matched the ligand in the decoy’s reference state (predicted most prevalent form at pH 7.05). In the final decoy procedure, ECFP4 fingerprints were generated by Scitegic’s Pipeline Pilot for ligands and potential decoys. The decoys were sorted by their maximum Tc to any ligand and the most dissimilar 25% were retained through this dissimilarity filter. We then de-duplicated decoys between all ligands by sorting decoys from least to most duplicated, and assigned each decoy to the ligand which has the least number of decoys already assigned. This ensures unique decoys were spread across the ligands as evenly as possible. Finally, if available, 50 decoys were picked randomly from this de-duplicated list. Michael Mysinger, John Irwin, Brian Shoichet