- ChemAxon
Transcription
- ChemAxon
Development of High-Content Molecular Libraries for Filling the Gap between Target and Ligand Chemical Spaces Dr. Mireille KRIER Dr. Didier ROGNAN Bioinformatics of the Drug CNRS UMR 7081 F-67400 Illkirch, France [email protected] [email protected] Databases for the Drug Discovery Process Overview of our inhouse databases BioinfoDB SBI Sc-PDB hGPCR-Lig BioinfoDB The supplier compounds database http://bioinfo-pharma.u-strasbg.fr/bioinfo Commercially-available screening collections Important sources for identifying hits by virtual screening (VS) They cannot be directly used as such for VS because of some issues: 9 redundancy (intra and inter-duplicates) 9 diversity 9 unknown drug- or lead-likeness 9 unsuitable format (non-ionized, counter-ions, racemates) ‘Unified’ screening collections are available http://www.chemnavigator.com i-Research chemical Library 21 million samples not ready to screen not ‘clean’ not free http://www.mdli.com MDL screening Compounds Directory 3.5 million structures not ready to screen ± clean not free http://blaster.docking.org/zinc/ Zinc 3.3 million structures ready to screen relatively clean free Preprocessing workflow for BioinfoDB Raw Libraries File and Data handling Filters Error checking Molecule separation Duplicate removal Definition of 162 filtering rules (property, functional group) 8 topological descriptors (e.g. MW, PSA, etc..) 11 atom-based matchcounts (inorganic, carbon/heteroatom ratio, etc ..) 78 chemical moities with matchcounts (aldehyde, aziridine, etc…) 32 dyes 34 promiscuous binding motifs 3D structure generation Stereoisomer(s) Protomeric state Descriptor calculations BioinfoDB Rognan (2005) La Gazette du CINES, 20, 1-4. The ‘BioinfoDB’ Library Necessity to customise a high-content collection of commerciallyavailable ‘drug-like’ compounds: - coverage of all stock compounds, deliverable in vials - removal of redundancy (within and between diverse collections) - selection of user-defined profiles (drug-like, lead-like, scaffolds, fragments) - accurate chemoinformatics (ionization, stereochemistry, tautomerism, descriptors) - avoid format conversions - storage in a SQL database (1-D: smiles, 2-D: sd, 3-D: mol2) - easy to browse (web interface) - easy to update with a fully automated protocol => Choice of Chemaxon to customise a database of high-quality ‘drug-like’ compounds Jchem Marvin beans Filter the structures Evaluator: JChem module to filter molecules by chemical expression according a user-defined intensity (pharmacological tool, drug-like, lead-like, fragment SMARTS definitions Filtering rules Drug-like structures Browsing the BioinfoDB Import the annotated SD file in a SQL table under JChem Base Browsing by JSP queries http://bioinfo-pharma.u-strasbg.fr/bioinfo Bioinfo release Bioinfo 5.15.1 release Drug-likeness of commercial Libraries ? Drug likeness % 100 80 60 ACD Asinex Bionet Biospecs Chembridge ChemDiv ChemStar 40 CNRS InterBioScreen 20 0 LeadQuest Maybridge Timtec VitasM Krier et al. (2006), J. Chem. Info. Model., 46, 512-524 SBI The scaffolds database http://bioinfo-pharma.u-strasbg.fr/scaffolds Diversity analysis workflow of compound libraries Cpds Lib. Cluster by MCS Duplicates and tautomers detected by InChI Remove redundancy Classes Singletons Calculate R-groups Scaffolds Lib. 21 393 scaffolds No Yes Krier et al. (2006) J. Chem. Info. Model., 46, 512-524 Quantify diversity by PC50C, NC50C 25 Percentage of scaffolds Rare scaffolds Number of compounds in class > 25 R-Group distribution 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Number of R-Groups on scaffold Diversity of commercial collections MedChem scaffolds (>25 cpds in class) CBG Scaffold Diversity Collection Size Low Large Low Medium Medium Medium High Small Classified compounds ( *10^3 ) 150 IBSs CDIc 100 ASIg ASIp SPE 50 VITs TRI CDIi MDDR* TIMs CST MAY NET IBSn VITt 0 5 10 15 CNR 20 TIMn 35 PC50C = % of classes containing 50% of the classified compounds MDDR = MDL Drug Data Report ASIg = ASINEX Gold Asip = ASINEX Platinum CBG = CHEMBRIDGE CDIc = CHEMDIV Clab CDIi = CHEMDIV Idc CNR = CNRS Patrimoine CST = CHEMSTAR IBSn = INTERBIOSCREEN Natural IBSs = INTERBIOSCREEN Synthetic MAY = MAYBRIDGE NET = BIONET TRI = TRIPOS VITs = VITAS-M Synthetic VITt = VITAS-M Natural SPE = SPECS TIMn = TIMTEC Natural TIMs = TIMTEC Synthetic Krier et al. (2006) J. Chem. Info. Model., 46, 512-524 Browsing the SBI Sc-PDB The database annotating Proteins, Active Sites & Ligands of the Protein Data Bank http://bioinfo-pharma.u-strasbg.fr/scPDB sc-PDB Development undesirable entries 30 000 entries solvent, detergent, etc… Organic Ligand Cofactor/ Ions Peptide Ligand Potential Ligands undesirable cofactors/ Ions Target undesirable ligands Ligands Active sites Topological screen 1 Ligand / Site pair 6 415 entries Target Ligand Site Paul et al. (2004) Proteins, 54, 671-680. Kellenberger et al. (2006) J. Chem. Info. Model., 46, 717-727. Sc-PDB: Distribution 1000 1 706 non redundant proteins 2 721 non redundant ligands number of ligands number of proteins 100 50 40 30 20 10 0 10 20 30 40 50 60 70 80 140 150 160 170 180 number of occurences 100 70 60 50 40 30 20 10 0 0 20 40 60 1850 (35%) 100 120 140 160 number of occurences 169 (3.2%) 253 (4.8%) 1129 (22%) 369 (7%) 80 Peptides, pseudopeptides (13%) oxidoreductase transferase hydrolase lyase isomerase ligase Nucleic acids (12%) Sugars (10%) Lipids (0.51%) 1469 (28%) Organics (64%) Browsing the Sc-PDB The sc-PDB can be browsed to prioritize protein-ligand complexes using simple user-defined queries based on : - Ligand/cofactor properties AND/OR - Target properties hGPCR-Lig The database matching the GPCR protein space with the GPCR ligand space http://bioinfo-pharma.u.strasbg.fr/hGPCRLig GPCR Topology 7 Transmembrane Helical domains N E1 E2 E3 1 2 3 4 5 6 7 I1 I2 I3 C Broad Ligand diversity photon monoamines peptides chemokines hormones Ca++ glutamate Thrombin Anaphylatoxin C3a, C5a EGF-TM7 GPCR Chemoproteomics Similar binding sites should recognize similar ligands Predict the ligands of a given target Predict the target(s) of a given ligand Compare targets (ligand binding sites) Predict selectivity profiles (ligand, target) Matching Target with Ligand space Ligand space Target space ca. 800 human GPCRs ca. 17 000 known GPCR ligands (MDDR database) All druggable ? drug-like, lead-like, fragment-like? How to organize it ? How to organize it ? Match both spaces ? Assist hit discovery for new GPCRs ? GPCR Target space: hGPCR database contains most human non-olfactory GPCRs obtain reliable sequence alignments (7-TMs) generate reliable high-throughput 3D models not bias the TM cavity by the X-ray structure of bovine rhodopsin 1f88 PDB entry http://bioinfo-pharma.u.strasbg.fr/hGPCRLig GPCR-Mod: High-throughput modelling of GPCRs UniProt Sequences GPCR-Align Multiple Alignement (TMs) 369 3-D Models (ground state) 369 TM cavities (30 residues) Bissantz et al. (2004) JCICS, 44, 1162-1176. GPCR-Gen Automated generation of 3-D coordinates (TMs) GPCR-find TM cavity Comparison Reducing the complexity of information Set of 369 human GPCRs Highly variable amino acid sequences (290Æ 6,200 residues) How to reduce complexity w/o loosing information ? pl i Si m s 7-TMs (189) cu Fo c it y Cavity (30) Full sequences (290-6,230) Information Surgand et al. (2006) Proteins, 62, 509-532 Chemoproteomic analysis of human GPCRs 1. Determine a consensus TM cavity 2. Concatenate TM cavitylining residues in ungapped sequences (30 residues pointing inwards the cavity and frequently used by most neutral antagonists/inverse agonists) 7.35 7.39 1.35 1.39 1.42 7.43 6.51 6.48 6.44 6.52 6.55 5.43 7.45 2.65 3.36 1.46 2.61 2.58 3.32 2.57 3.40 5.39 5.46 3.33 3.28 3.29 5.38 4.56 4.60 5.42 Chemoproteomic analysis of human GPCRs 3. Derive a TM cavity-biased phylogenetic tree Pairwise distance: identity Hierarchical clustering: UPGMA Bootstrapping 1,000 replica Consensus tree Prostanoids (8) 906 799 Adhesion (33) Glycoproteins (8) 894 SREBs (6) MAS (11) 648 Opsins (10) Secretin (15) Glutamate (23) 1000 238 806 775 Amines (45) 883 780 Melanocortin (5) 1000 Brain-gut peptides (10) 273 Adenosine (6)449 620 Frizzled (11) 485 Lipids (14) Vasopeptides 211 726 (7) Melatonin (7) Peptides (26) 676 Opiates (13) 431 409 747 Purines (35) Chemokines (23) Chemoattractants (17) 909 Acids (5) Surgand et al. (2006) Proteins, 62, 509-532 Organising GPCR Ligand space MDDR database 150 K cpds Hand-curated + GPCR-Ligands 2,5 K cpds Keyword-based Search 17 K GPCR ligands MCS Clustering 958 scaffolds Creation of an annotated compound library directed to the GPCR family Matching Target and Ligand spaces Clusters Scaffolds S1 C1 C2 C3 C4 C5 Enrichment, % 100 S2 S3 80 S4 S5 S6 60 40 S7 S8 S9 S10 20 0 Matching Target and Ligand spaces Cluster Class # Cpds Scaffold Enrichment, % Cluster significance Matching Ligand to Target space N N N N H 1.Database search O AG2R AG2S AG22 GHSR L4R1 L4R2 6 known GPCR targets 2.Cavity alignment OH 3.Extracting hotspots Privileged structures Chemoproteomic link APJ C5L2 FMLR GALS GPR1 Q9GZQ4 C3AR CML1 G2A GP15 MTLR SPR1 C5AR FML1 GALR GP44 NTR1 4. Cavity search 17 putative new GPCR targets Expt. Validation: AT1, AT2 ligands Æ GPR44 (CRTh2) ligands Frimurer et al. (Bioorg Med Chem Lett 2005; 15:3707-3712 ) TM hotspots Measuring Distances between 2 GPCRs 1-D Approach 3-D Approach Measure identity of 30 cavity-lining residues Entry Projecting descriptors onto a cavity-centered sphere TM Cavity 5HT1A TLLAVLAQFIDVCIIPYTSTAFWFFAGNYN 5HT7R ILITVMVDFIDVCIIPYTSTAFWFFSELYN Similarity = Common/Total = 0.7 Discretized sphere (80 triangles) 3 geometrical descriptors 5 physchem descriptors Projection to Cα atoms Normalized score Similarity (5HT1A vs. 5HT7R) = 0.94 Browsing the hGPCR-Lig Conclusions Creation of annotated compound libraries Chemaxon tools help us to build up the basis for chemoproteomics analysis Easy interfacing with other applications Acknowledgements Claire SCHALON Dr. Esther KELLENBERGER Guillaume BRET Dr. Didier ROGNAN Nicolas FOATA Pascal MULLER Dr. Jean-Sébastien SURGAND