How to Generate Reliable and Predictive CoMFA Models L. Zhang
Transcription
How to Generate Reliable and Predictive CoMFA Models L. Zhang
Current Medicinal Chemistry, 2011, 18, ????-???? 1 How to Generate Reliable and Predictive CoMFA Models L. Zhang1, K.-C. Tsai2, L. Du1, H. Fang1, M. Li*,1 and W. Xu*,1 1 Department of Medicinal Chemistry, School of Pharmacy, Shandong University, Jinan, Shandong 250012, China 2 The Genomics Research Center, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 115, Taiwan Abstract: Comparative Molecular Field Analysis (CoMFA) is a mainstream and down-to-earth 3D QSAR technique in the coverage of drug discovery and development. Even though CoMFA is remarkable for high predictive capacity, the intrinsic data-dependent characteristic still makes this methodology certainly be handicapped by noise. It's well known that the default settings in CoMFA can bring about predictive QSAR models, in the meanwhile optimized parameters was proven to provide more predictive results. Accordingly, so far numerous endeavors have been accomplished to ameliorate the CoMFA model’s robustness and predictive accuracy by considering various factors, including molecular conformation and alignment, field descriptors and grid spacing. Herein, we would like to make a comprehensive survey of the conceivable descriptors and their contribution to the CoMFA model’s predictive ability. Keywords: CoMFA, conformation, alignment, fields, grid spacing. 1. INTRODUCTION Quantitative structure-activity relationship (QSAR) studies generally perform a crucial role in drug discovery and design as a ligand-based approach [1]. Such approaches are explicitly judgmental to provide not only the reliable prediction of specific properties of new compounds, but also the help to elucidate the possible molecular mechanism of the receptor-ligand interactions, in case that the experimental NMR or crystal structure of the target protein is unavailable [2]. As one of the most sought-after QSAR methods, Comparative Molecular Field Analysis (CoMFA) recruits interactive graphics and statistical techniques for correlating several molecular features, such as steric and electrostatic properties with their biological activities [3]. It needs to be noted that over the past few decades, CoMFA has become tremendously prevalent in regions of both industrial and academic research regarding QSAR studies. A Scifinder Scholar survey with the term ‘CoMFA’ indicated about 160 publications in 2009. This result compares with only about 50 publications in 1995 (Fig. 1). These CoMFA applications in drug design have been comprehensively summarized in several excellent book chapters as well as review articles [4-6]. terms of Lennard-Jones and Coulombic potentials, respectively. The steric and electrostatic potential energies are calculated by a probe atom, located at each vertex of a spaced lattice, in which a series of molecules are embedded. The performance of the standard CoMFA procedure requires the specification of both conformations and alignments of molecules. Despite their popularity, CoMFA analyses can be highly sensitive to the parameters used in QSAR modeling, including the setting of steric fields, molecular alignments, and grid spacing and dimensions. Because of these variances, there have been interests in finding ways to enhance QSAR quality and in building robust CoMFA models. The common steps for CoMFA modeling (Fig. 2) include: 1. Active molecules are placed in a three-dimensional grid (2Å spacing) encompassing all of the molecules. 2. At each grid point, steric energy (Lennard-Jones potential) and electrostatic energy are measured for each molecule by a probe atom (sp3-hybridized carbon with +1 charge). 3. To minimize domination by large steric and electrostatic energies, all energies that exceed a specified value (default 30 kcal/mol) are set to the cutoff value. 4. CoMFA uses a partial least-squares (PLS) analysis to predict activity from energy values at the grid points. Based on these procedures, hither we would like to review the up-to-date literatures for enhancing the quality of CoMFA modeling herein. CoMFA Settings It is a strenuous job for modelers to determine which settings, or combinations of settings, are suitable for their data sets. Notwithstanding the fact that a number of settings are provided, most CoMFA users still rely on the default parameters (Fig. 3). Hitherto, the influences of adjusting one or more CoMFA settings such as steric molecular field settings, grid distances and cut off values have been well explored by several studies [7-10]. Fig. (1). CoMFA publication number by year, 1988-2009. In CoMFA philosophy, the biological properties of molecules are correlated with steric and electrostatic potential energies in *Address correspondence to this author at the Department of Medicinal Chemistry, School of Pharmacy, Shandong University, Jinan, Shandong 250012, China; Tel/Fax: +86-531-8838-2076; E-mail: [email protected]; Tel/Fax: +86-531-88382264; E-mail: [email protected] 0929-8673/11 $58.00+.00 Karlen and coworkers produced a total of 6120 CoMFA models by settings optimization to evaluate the possibility of improving the predictive ability [11]. This effort was evaluated by nine different data sets, and the optimal models shared ordinary feature of steric fields derived by either indicator or squared transform types. The internal and external predictive abilities of the derived models were successfully strengthened by testing distinctive combinations of CoMFA settings. Molecular Conformation Cramer and his coworkers have annotated that “active conformation” and “alignment rules” are major would-be dilemmas for © 2011 Bentham Science Publishers Ltd. 2 Current Medicinal Chemistry, 2011 Vol. 18, No. 1 Zhang et al. tory for a CoMFA modeling, along with the bioactive shapes as prerequisite for a credible model. It’s pretty recognized that construction of bioactive conformation for a set of molecules is arduously realizable by way of crystallographic studies. Nevertheless, diverse computational approaches, including docking, molecular dynamic simulation and conformational sampling, are prosperously ready for use to predict the bioactive conformation. Provided that the holo-form crystal structure for target-ligand complex is attainable and the data set compounds have an analogous scaffold with the ligand, this cocrystallized ligand can be of course applied as a template for establishing the framework of the selected data set. This method is straightforward butcommon-used in the 3D-QSAR resolution [14-18]. It needs to be remarked that this method can be exclusively plausible for a bunch of rigid and structurally similar molecules. Notwithstanding, for a collection of highly structurally flexible or diverse molecules, this method is too much simple to be persuasive. Fig. (2). The standard CoMFA process. CoMFA [12]. Data sets for CoMFA examination should manoeuvre through the same mechanism of action, as well as have a common pharmacophore even if the molecular skeleton is different [4, 5, 13]. Three dimensional structures of the selected compounds are manda- A common used approach for predicting the bioactive conformation is molecular docking. In the docking procedure, a ligand was positioned to the functional site of target for determining the binding mode of the complex as well as generating the active conformation of the ligand. So far a number of predictive CoMFA models were successfully generated based on docking conformations, such as Chretien’s acetylcholinesterase (AChE) model [19], Bharatam’s glycogen synthase kinase-3 (GSK-3) model [20], Qiu’s c-Jun N-terminal kinase-1 (JNK-1) model [21], and Yao’s vascular endothelial growth factor receptor tyrosine kinase-2 (VEGFR-2) model [22]. Molecular dynamic simulation of the ligand-target complex can still provide a precise conformation. There are numerous successful examples for CoMFA modeling based on MD simulation, for ex- Fig. (3). Decision tree for determining possible combinations of CoMFA settings [11]. How to Generate Reliable and Predictive ample, Shang and coworkers built the initial structures of inhibitors by MD simulation of enzyme-substrate complexes for their CoMFA study [23]. Nevertheless, such accurate calculation is still too timeconsuming and resource-depended to be applied in QSAR modeling. Bioactive conformations have to be in silico simulated in case that the receptor experiential structure is not available. A systematic search [24-26] or annealing process [27-29] can be usually utilized to generate the low energy conformation for a template molecule; subsequently the remaining molecules in the dataset can be constructed based on the above-mentioned reference conformation. Active analog approach generates the possible conformations by conformational analysis and selects the best ones that satisfy the interatomic distances in a working hypothesis [30-32]. Local minimization method determines active conformations through systematic conformational searches followed by minimization on the rigid rotor search surface [33]. Nonetheless, this procedure can exclusively lead to the minimized conformations, even if the bioactive conformation is not equal to the lowest energy. A “similarity” problem must be figured out is that if compounds in the test set have significant conformational diversity compared with the training set molecules (contain certain features in a region not explored by the training set), the CoMFA model generated based on the training set can’t accurately predict activities of the test set [34]. Molecular Alignment Molecular superposition plays a decisive role in CoMFA analysis, since the relative interaction energies depend strongly on relative molecular positions. Flexible groups and conformation diversity present dilemmas to molecular alignment. In this case, a low quality alignment can result in unexpected input noise. Pseudoreceptor modeling [35-40] and 4D QSAR analysis [41-43] show high performance because they have the advantage of reducing input noise and decreasing risk of overfitting. As to CoMFA, various methods were developed aiming to seek a strict or a “perfect” molecular overlay. A receptor-based alignment approach can be achieved by the molecular docking method. In this procedure, similar molecules were located to the same region of the active site, and subsequently, an automatic alignment was generated. Using alignment derived from this receptor-based approach can generate highly predictive CoMFA models, and even shows superiority to the ligand-based method [40, 44-46]. The pharmacophore modeling is another automatic molecular alignment generation approach. Unlike molecular docking, this method is independent on the receptor structure. It determines the pharmacophore features of the ligands and uses these features to align the molecular structures. Since the problem of conformational diversity and flexibility is solved in some distance, it is widely applied in construction of CoMFA models [47-50]. Force fields can be used to align molecules and the field fit algorithm is used to calculate field values of the grid points. Lattice points with certain field value magnitude are employed for molecular overlapping. Dove and coworkers compared three different alignment rules, and found that the weighted field fit alignment rule gave the best model [51]. It is concluded that the weighted field fit method reduced the risk of producing artificial redundancy of the structures and ignoring entropy contributions to the free energy of binding. Even though the automatic alignment rules are widely practiced in the CoMFA approach with high performance, the manual alignment method is a competent alternative for yielding high predictive models as well. Tervo and coworker derived CoMFA models based on two alignment rules, docking and manual methods, for a set of Current Medicinal Chemistry, 2011 Vol. 18, No. 1 3 flexible molecules [52]. Interestingly, the results suggested that model with better predictive quality was constructed on the basis of manual alignment rule. The brandnew topomer CoMFA method introduced since 1998 can break the input structures into fragments and removes any core fragment structurally common to the entire series [53].The field values are calculated for the left fragments. It should be spotlighted that this novel fragment-based alignment method is proved to be promising CoMFA construction approach for both QSAR experts and newbies within few minutes, and results are generally equivalent to canonical CoMFA analysis [53, 54]. CoMFA Fields In CoMFA analysis, a collection of structurally aligned molecules are represented in terms of property fields, which are evaluated between a probe atom and each molecule at regularly spaced intervals on a grid. The acquiescent CoMFA fields, steric and electrostatic fields, are calculated by Lennard-Jones potential (eq 1) and Columbic potential (eq 2), respectively. n ( EvdW = Aij rij12 Cij rij6 i=1 n EC = i=1 qi q j Drij ) (1) (2) where EvdW is van der Waals interaction energy, rij is distance between atom i of the molecule and the grid point j where the probe atom is located, Aij and Cij are constants depending on the van der Waals, radii of the corresponding atoms. EC is coulomb interaction energy, qi is partial charge of atom i of the molecule, qj is charge of the probe atom, D is dielectric constant. Both potential functions are remarkably steep nearby the van der Waals surface of the molecules, thus resulting in rapid changes in surface descriptions (Fig. 4). The steric fields designate Van der Waals interactions between the molecule and its receptor. The standard CoMFA method functions the Lennard Jones 6-12 potential, which is characterized by a highly steep enhancement in energy at short distances to calculate the steric interaction between a probe atom and a molecule. So far different methods were applied to calculate steric field, their contribution to QSAR quality were evaluated as well [55, 56]. The indicator method assigned an energy 0 to steric potentials which fall below the cutoff value, and assigned a nominal energy (equal to the cutoff) to potentials falling at or above the cutoff [34]. The parabolic approach made the magnitude of the calculated potential at each lattice point be squared [11]. Gaussian function which has a slower and smoother decrease was facilitated in CoMSIA method for calculating molecular field potentials. To prevent unjustified large parametric variance, steric energy truncated at lower values was well documented because of the steep increase of the steric field contribution at lattice points close to the molecule. For example, Kim and Martin truncated the steric energy to 4.0 [58], in the meanwhile Klebe and coworkers set two different energy truncation of 30 and 5 for evaluating the effect of energy truncation [59]. In 2008, Sorich and coworkers published their studies for investigating the guidance of steric field settings on the predictive performance of CoMFA [7]. In this case, 3D QSAR models based on Lennard-Jones, indicator, parabolic and Qaussian steric fields were sensibly compared using 28 datasets. The results demonstrated that the preformance of Lennard-Jones and indicator fields that have a steep value decrease was better than of Gaussian type, which have a smoother decrease. 4 Current Medicinal Chemistry, 2011 Vol. 18, No. 1 Zhang et al. MMFF, PRODRG, Pullman and VC2003, on prediction accuracy in CoMFA and CoMSIA studies by using several benchmark datasets [66]. In general, the semi-empirical charges, such as AM1 and AM1-BCC, provided higher predictive CoMFA and CoMSIA models than the Gasteiger and Gasteiger-Hückel charges which are commonly used in QSAR studies. Interesting, the CFF partial charge was found to obtain the most predictive CoMFA and CoMSIA models. As an empirical charge-assigning method with a short computing time, the CFF charge offer advantages over the other eleven semi-empirical and empirical charges in performing the most accurate electrostatic potential calculations for CoMFA and CoMSIA studies. These results presented should help the selection of electrostatic potential models in CoMFA and CoMSIA studies. Grid Spacing Kroemer and coworker elucidated that the appearance of the molecular structures should be precisely characterized in the lattice, whereas the degree of differentiation should not be excessively high [33]. For that reason, an exceptional grid is constrained to discriminate atoms of different molecules and then formulate the corresponding values into the descriptor matrix. Fig. (4). Steric and electrostatic fields in CoMFA studies [57]. As one predominant effecter on CoMFA generation, electrostatic field is typically brought about by calculating the Coulomb potential between a probe and the molecule. The empirical partial charge methods, including Gasteiger-Marsili, Gasteiger-Huckel and MMFF94, are comprehensively used in released CoMFA studies. Developed on the concept of equalization of electronegativities, these emprical approaches are simple-handled and quick-witted in assigning the partial charges. The common-used semiempirical methods, including the modified neglect of differential overlap (MNDO), Austin model 1 (AM1) and parametric model 3 (PM3), are a tradeoff betwixt the empirical and ab initio quantum chemical approaches (HF/STO-3G, HF/3-21G* and HF/6-31G*) in terms of precision and computational time. The ab initio approaches are of high meticulousness but low computational efficiency. Contrarily, the empirical methods have fast speed but relative low accuracy. The semi-empirical techniques have enhancement in speed over the ab initio methods. Nevertheless, they did not have significant improvment on improvment on accuracy when compared with the empirical methods. Hitherto a number of studies were accomplished to evaluate the importance of partial charges on CoMFA model’s predictive quality. [60-63] For example, Welsh and coworkers performed CoMFA on a set of human immunodeficiency type 1 (HIV-1) protease inhibitors and different charge assignment schemes (Discover CVFF, Gasteiger-Marsili and AM1-ESP) were evaluated [64]. The best model was constructed using AM1-ESP partial charges. Recently, Sorich and coworker examined the contribution of partial charge calculation methods to the predictive ability of the generated CoMFA models [65]. In their study, Gasteiger, Gasteiger-Huckel, MMFF94, AM1, MNDO and PM3 charges were assigned to 30 data sets. The authors found that semi-empirical charge calculation methods suggested for the most predictive models, MMFF94 was also a good alternative for its predictive ability (not significantly worse than the semiempirical methods) and fast calculation speed. We also did a study for comparing twelve semi-empirical and empirical charge-assigning methods, including AM1, AM1-BCC, CFF, Del-Re, Formal, Gasteiger, Gasteiger-Hückel, Hückel, The influence of grid spacing has been thoroughly evaluated in foregoing surveys. Consequently, these results demonstrated that the domination of lattice location and size is not appreciable because of limited datasets [67, 68]. Tropsha and coworker presented that changing the dimensions of the region can lead to an unreasonable predictive ability of CoMFA model [69]. They mentioned that this characteristic should be well taken into account when operating CoMFA. Likewise, Sorich and coworkers substantiated that there is a statistically considerable function of grid spacing on predictability for convinced steric field settings [7]. They also manifested that lattice density has a somewhat subsidiary influence on the predictive capacity of Gaussian steric fields, whereas such parameters are able to manipulate the CoMFA quality based on Lennard-Jones, indicator and parabolic steric fields Other Descriptors Hydrophobic property was frequently thought about in CoMFA studies, for enlightening the CoMFA models’ robustness and predictive capability [70-72]. Wiese and coworkers established more than 350 CoMFA models and then benchmarked them using steric, electrostatic and hydrophobic fields alone and in combination [73]. These consequences indicated that hydrophobic fields could boost the correlative and predictive power in all cases. In the following year, they compared the model quality by 3D (HINT hydrophobic field) and logP (HINT and ClogP values) presentations of hydrophobicity, as well as evaluated the likeness between standard CoMFA and hydrophobic fields [74]. They further exhibited that ClogP is prior to logP when deriving hydrophobic fields for the generation of CoMFA model. This thought-provoking evidence uncovered that hydrophobic properties can perform the indicative role in describing the molecule. Even so, in the event that the hydrophobic properties have no correlation with the activity, these properties may present negative contribution to the QSAR quality. So far a number of examinations showed that consideration of H-bond descriptors can improve the quality of the CoMFA model certainly [75, 76]. Xu and coworkers carried out CoMFA models on a series of gamma-hydroxy butenolide endothelin antagonists in the presence of additional H-bond fields [77]. The results confirmed that H-bond fields significantly improved the quality of the derived model. Moreover, Pan and coworkers emphasized the influence of H-bond fields on improving the CoMFA model quality based on a set of protein tyrosine phophatase 1B (PTP1B) inhibitors [78]. The guidances of frontier orbital (HOMO and LUMO) energies as additional physical descriptors were also evaluated in diverse How to Generate Reliable and Predictive Current Medicinal Chemistry, 2011 Vol. 18, No. 1 5 QSAR studies. In result, the introduction of HOMO and LUMO are proven to be suggestive for the acquired CoMFA models [79-82]. The incorporation of ligand-receptor energies to the CoMFA model can contribute to the model quality as well [83]. In CoMFA modeling, a sp3 carbon with +1.0 charges is generally applied as the default probe atom. However, the selection of probe atoms can involve in the CoMFA generation. Hannongbua and coworker examined the impacts of different probe atoms (Csp3 (+1), Osp3 (-1) and H (+1)) on the quality of the generated CoMFA models. Their results elucidated that a combination of these three probes led to the most predictive model [84]. Statistical Analysis Partial least squares (PLS), developed in 1986, is a prevailing statistical algorithm for deriving linear relationships among columns of data [85-88]. Theoretically, PLS, a regression function, looks for linear correlation of column variance in target properties with variations in explanatory properties to minimize the sum of squares of deviations. It is like a factor analysis of the explanatory properties in which the object is to maximize alignment with the target property values rather than with the Cartesian or other axes. For this feature, PLS analysis is sometimes compared to principal component regression analysis (PCA) in its derivation of vectors from the Y and X blocks. PCA, introduced by Karl Pearson, involves a mathematical procedure that transforms a number of possibly correlated variables into the smaller number of uncorrelated variables [89]. Unlike PCA, which only considers the influence of the input data array, PLS possesses both the input and the output data matrixes into consideration. Another critical algorithm, SAMPLS, created by Bruce Bush, tremendously accelerates the cross-validation procedure [90]. In SAMPLS, the latent variables are derived from the n n covariance matrix (Fig. 5). Validation methods such as cross-validation (including leave-one-out [12], leave-many-out [91-93] and leave-group-out [94]) and bootstrapping [95-97] have employed to examine the robustness of the generated models. The quality of the resulting QSAR models can be judged by statistical means such as r2 (the fraction of explained variance, eq. 3) for the test set, and by q2 (the cross-validated or predictive r2, eq. 4) for the training set. The fraction of explained variance, r2, measures the QSAR model’s ability to interpret the variance in the data; in other words, it estimates the goodness-of-fit of the regression model derived from the training set. The predictive r2, or q2, refers to the internal robustness of the QSAR model. Model quality estimation are procured either by using cross-validation procedure (internal) or by predicting external compounds (previously not used in the model). n r 2 = 1 (Y c i=1 n (Y m Fig. (5). Description of PLS analysis, vectors u and t are derived from the Y block and the X block, representatively. BAi = logarithms of activities, Sij = steric field variable of molecule i in the grid point j, Eij = electrostatic field variable of molecular i in the grid point j. q2 > 0.5 r > 0.6 Ya )2 (3) Ya )2 n PRESS = 1 SD (Y a i=1 n (Y a Yp ) (6) (r r ) (r r ' ) < 0.1 or < 0.1 r2 r2 (7) 0.85 k 1.15 or 0.85 k’ 1.15 (8) 2 i=1 q 2 = 1 (5) 2 2 0 2 2 0 2 2 (4) Ym )2 i=1 Where Ya is an actual value, Yp is a predictive value, Yc is the average value of predicted values, Ym is the average value of observed activities. PRESS = predictive sum of squares, SD = sum of squared deviations. Recently Tropsha and coworkers introduced a new validation criterion for robust QSAR models (eq. 5-8) [98, 99]. They consider a QSAR model predictive, if the following conditions are satisfied: where r is the correlation coefficient between the predicted and observed activities, r02 is the coefficient of determination between predicted and observed activities characterizing linear regression with the Y-intercept set to zero (i.e., described by Y = kX, where Y and X are actual and predicted activity, respectively), r '20 is the coefficient of determination between observed and predicted activities, and k and k’ are the slopes of the regression lines through the origin. The aforementioned procedure accentuated that high q2 does not ensure a predictive QSAR model. The predictive capability can only be appraised by an external set of molecules, which are not 6 Current Medicinal Chemistry, 2011 Vol. 18, No. 1 used for building the model. Models with high internal q2 but low external predictive abilities can be picked out by using these equations that were verified by 160 QSAR models. Zhang et al. GSK-3 = glycogen synthase kinase-3 JNK-1 = c-Jun N-terminal kinase-1 VEGFR-2 = vascular endothelial growth factor receptor tyrosine kinase-2 MNDO = modified neglect of differential overlap AM1 = Austin model 1 CONCLUSION The CoMFA technique has been developed for more than one couple of decades. Thus far a great number of CoMFA studies were performed based on this state-of-the-art approach. Scientists have also contributed everlasting and booming endeavors to improve the predictive quality of the CoMFA model. Herein, the practicable CoMFA descriptors, including molecular conformation, structural alignment, molecular fields, grid spacing and additional physical chemical properties, were well presented as a tutorial review to provide possible guidance to the further CoMFA studies. Among these crucial determinants, bioactive conformation and molecular superposition engage an essential portrayal in the CoMFA procedure, while different combination of fields and physical chemical properties results in diverse predictable levels. High predictive models can also be realized by adjusting settings, such as energy cutoff values, lattice size and probe types. In sum, suggestions for future CoMFA studies are outlined below. 1. The initial geometries of the molecules should be in bioactive or theoretical active framework; 2. Different charge methods should be carefully considered to establish a muscular CoMFA model; 3. A reasonable molecular alignment is mandatory for a trustworthy CoMFA model; 4. Cut-off values are needed both for the steric and electrostatic energy calculation and for the PLS analysis to reduce unwanted variance; 5. Other descriptors, such as ClogP, can substantially improve the reliability of the CoMFA model. In the absence of statistic significance in CoMFA generation, those descriptors can be taken into consideration; 6. Different probe atoms could be attentively considered to ameliorate the credibility of CoMFA model; 7. The lattice location and size should be unanimously deliberated. PM3 = parametric model 3 HIV-1 = human immunodeficiency type 1 PTP1B = protein tyrosine phophatase 1B SAMPLS = sample-distance partial least squares REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] ACKNOWLEDGEMENTS The present work was financed by grants from the PhD Programs Foundation of Ministry of Education of China (No. 20090131120080), the Doctoral Fund of Shandong Province (No. BS2009SW011), Fok Ying Tung Education Foundation (No. 122036), National Natural Science Foundation of China (No. 30901836 and 81001362), Shandong Natural Science Foundation (No. JQ201019) and Independent Innovation Foundation of Shandong University, IIFSDU (No. 2010JQ005). ABBREVIATIONS [15] [16] [17] [18] CoMFA = comparative molecular field analysis QSAR = quantitative structure-activity relationship PLS = partial least-squares [19] [20] PCA = principal component analysis MD = molecular dynamic [21] 3D-QSAR = three dimensional quantitative structureactivity relationship [22] comparative molecular similarity indices analysis [23] CoMSIA AChE = = acetylcholinesterase Lill, M.A. Multi-dimensional QSAR in drug discovery. Drug Discov. Today, 2007, 12, 1013-1017. Yang, G.F.; Huang, X. Development of quantitative structure-activity relationships and its application in rational drug design. Curr. Pharm. Des., 2006, 12, 4601-4611. Cramer, R.D.; Patterson, D.E.; Bunce, J.D. Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steriods to carrier proteins. J. Am. Chem. Soc., 1988, 110, 5959-5967. Kubinyi, H. QSAR and 3D QSAR in drug design Part 1: methodology. Drug Discov. Today, 1997, 2, 457-467. Kubinyi, H. QSAR and 3D QSAR in drug design Part 2: applications and problems. Drug Discov. Today, 1997, 2, 538-546. Podlogar, B.L.; Ferguson, D.M. QSAR and CoMFA: a perspective on the practical application to drug discovery. Drug Des. Discov., 2000, 17, 4-12. Mittal, R.R.; McKinnon, R.A.; Sorich, M.J. Effect of steric molecular field settings on CoMFA predictivity. J. Mol. Model., 2008, 14, 59-67. Bursi, R.; Grootenhuis, P.D.J. Comparative molecular field analysis and energy interaction studies of thrombin-inhibitor complexes. J. Comput. Aided Mol. Des., 1999, 13, 221-232. Dinan, L.; Hormann, R.E.; Fujimoto, T. An extensive ecdysteroid CoMFA. J. Comput. Aided Mol. Des., 1999, 13, 185-207. Melville, J.L.; Hirst, J.D. On the stability of CoMFA models. J. Chem. Inf. Comput. Sci., 2004, 44, 1294-1300. Peterson, S.D.; Schaal, W.; Karlen, A. Improved CoMFA modeling by optimization of settings. J. Chem. Inf. Model., 2006, 46, 355-364. Richard D. Cramer, I., David E. Patterson, and Jeffrey D. Bunce. Comparative Molecular Field Analysis (CoMFA). 1. Effect of Shape on Binding of Steroids to Carrier Proteins. J. Am. Chem. Soc., 1988, 110, 59595967. Kubinyi, H. QSAR and 3D QSAR in drug design Part 1: methodology. Drug Discov. Today, 1997, 2, 457-467. Chavatte, P.; Yous, S.; Beaurain, N.; Mesangeau, C.; Ferry, G.; Lesieur, D. Three-dimensional quantitative structure-activity relationship of arylalkylamine N-acetyltransferase (AANAT) inhibitors: A comparative molecular field analysis. Quant. Struct-Act. Rel., 2002, 20, 414-421. Tsai, K.C.; Lin, T.H. A ligand-based molecular modeling study on some matrix metalloproteinase-1 inhibitors using several 3D QSAR techniques. J. Chem. Inf. Comput. Sci., 2004, 44, 1857-1871. Yu, Z.H.; Niu, C.W.; Ban, S.R.; Wen, X.; Xi, Z. Study on structure-activity relationship of mutation-dependent herbicide resistance acetohydroxyacid synthase through 3D-QSAR and mutation. Chinese Sci. Bull., 2007, 52, 1929-1941. Lei, B.L.; Du, J.; Li, S.Y.; Liu, H.X.; Ren, Y.Y.; Yao, X.J. Comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) of thiazolone derivatives as hepatitis C virus NS5B polymerase allosteric inhibitors. J. Comput. Aided Mol. Des., 2008, 22, 711-725. Kaur, K.; Talele, T. Structure-based CoMFA and CoMSIA study of indolinone inhibitors of PDK1. J. Comput. Aided Mol. Des., 2009, 23, 25-36. Bernard, P.P.; Kireev, D.B.; Pintore, M.; Chretien, J.R.; Fortier, P.L.; Froment, D. A CoMFA study of enantiomeric organophosphorus inhibitors of acetylcholinesterase. J. Mol. Model., 2000, 6, 618-629. Dessalew, N.; Patel, D.S.; Bharatam, P.V. 3D-QSAR and molecular docking studies on pyrazolopyrimidine derivatives as glycogen synthase kinase-3 beta inhibitors. J. Mol. Graph. Model., 2007, 25, 885-895. Yi, P.; Qiu, M.H. 3D-QSAR and docking studies of aminopyridine carboxamide inhibitors of c-Jun N-terminal kinase-1. Eur. J. Med. Chem., 2008, 43, 604-613. Du, J.; Lei, B.L.; Qin, J.; Liu, H.X.; Yao, X.J. Molecular modeling studies of vascular endothelial growth factor receptor tyrosine kinase inhibitors using QSAR and docking. J. Mol. Graph. Model., 2009, 27, 642-654. Huang, M.L.; Yang, D.Y.; Shang, Z.C.; Zou, J.W.; Yu, Q.S. 3D-QSAR studies on 4-hydroxyphenylpyruvate dioxygenase inhibitors by comparative molecular field analysis (CoMFA). Bioorg. Med. Chem. Lett., 2002, 12, How to Generate Reliable and Predictive [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] 2271-2275. Aher, Y.D.; Agrawal, A.; Bharatam, P.V.; Garg, P. 3D-QSAR studies of substituted 1-(3,3-diphenylpropyl)-piperidinyl amides and ureas as CCR5 receptor antagonists. J. Mol. Model., 2007, 13, 519-529. Cho, W.J.; Kim, E.K.; Park, I.Y.; Jeong, E.Y.; Kim, T.S.; Le, T.N.; Kim, D.D.; Leed, E.S. Molecular Modeling of 3-arylisoquinoline antitumor agents active against A-549. A comparative molecular field analysis study. Bioorg. Med. Chem., 2002, 10, 2953-2961. Juan, A.A.S. Towards predictive inhibitor design for the EGFR autophosphorylation activity. Eur. J. Med. Chem., 2008, 43, 781-791. Chakraborti, A.K.; Gopalakrishnan, B.; Sobhia, M.E.; Malde, A. 3D-QSAR studies of indole derivatives as phosphodiesterase IV inhibitors. Eur. J. Med. Chem., 2003, 38, 975-982. Ganguly, S.; Banerjee, S. 3D-QSAR studies of imidazole derivatives as Candida albicans P450-demethylase inhibitors. Asian J. Chem., 2008, 20, 4595-4608. Adane, L.; Bharatam, P.V. 3D-QSAR analysis of cycloguanil derivatives as inhibitors of A16V+S108T mutant Plasmodium falciparum dihydrofolate reductase enzyme. J. Mol. Graph. Model., 2009, 28, 357-367. Sufrin, J.R.; Dunn, D.A.; Marshall, G.R. Steric mapping of the L-methionine binding site of ATP:L-methionine S-adenosyltransferase. Mol. Pharmacol., 1981, 19, 307-313. J-P Bjorkroth, T.A.P., and J. Lindroos. Comparative Molecular Field Analysis of Some Clodronic Acid Esters. J. Med. Chem.,1991, 34, 23382343. Deborah A. Loughney, C.F.S. A comparison of progestin and androgen receptor binding using the CoMFA technique. J. Comput. Aided Mol. Des.,1992, 6, 569-581. Demeter, D.A.; Weintraub, H.J.R.; Knittel, J.J. The local minima method (LMM) of pharmacophore determination: A protocol for predicting the bioactive conformation of small, conformationally flexible molecules. J. Chem. Inf. Comput. Sci., 1998, 38, 1125-1136. Kroemer, R.T.; Hecht, P.; Guessregen, S.; Liedl, K.R. Improving the predictive quality of CoMFA models. Perspect. Drug Discov. Des., 1998, 12, 41-56. Pei, J.F.; Zhou, J.J.; Xie, G.R.; Chen, H.M.; He, X.F. PARM: A practical utility for drug design. J. Mol. Graph. Model., 2001, 19, 448-454. Peng, T.; Pei, J.F.; Zhou, J.J. 3D-QSAR and receptor modeling of tyrosine kinase inhibitors with flexible atom receptor model (FLARM). J. Chem. Inf. Comput. Sci., 2003, 43, 298-303. Pei, J.F.; Chen, H.; Liu, Z.M.; Han, X.F.; Wang, Q.; Shen, B.; Zhou, J.J.; Lai, L.H. Improving the quality of 3D-QSAR by using flexible-ligand receptor models. J. Chem. Inf. Model., 2005, 45, 1920-1933. Lu, A.J.; Zhou, J.J. Pseudoreceptor models and 3D-QSAR for imidazobenzodiazepines at GABA(A)/BzR subtypes alpha(x)beta(3)gamma(2) [x=1-3, 5, and 6] via flexible atom receptor model. J. Chem. Inf. Comput. Sci., 2004, 44, 1130-1136. Shen, B.; Lu, Z.H.; Chi, X.B.; Lu, H.F.; Ren, T.R. Research on pseudoreceptor models for the inhibitors at GABA receptors via flexible atom receptor model. Acta. Phys-Chimi. Sin., 2005, 21, 800-803. Wichapong, K.; Lindner, M.; Pianwanit, S.; Kokpol, S.; Sippl, W. Receptorbased 3D-QSAR studies of checkpoint Wee1 kinase inhibitors. Eur. J. Med. Chem., 2009, 44, 1383-1395. A. J. Hopfinger, S.W., John S. Tokarski, Baiqiang Jin, Magaly Albuquerque, Prakash J. Madhav, Chaya Duraiswami. Construction of 3D-QSAR Models Using the 4D-QSAR Analysis Formalism. J. Am. Chem. Soc., 1997, 119, 10509-10524. Ravi, M.; Hopfinger, A.J.; Hormann, R.E.; Dinan, L. 4D-QSAR analysis of a set of ecdysteroids and a comparison to CoMFA modeling. J. Chem. Inf. Comput. Sci., 2001, 41, 1587-1604. Martins, J.P.A.; Barbosa, E.G.; Pasqualoto, K.F.M.; Ferreira, M.M.C. LQTA-QSAR: A New 4D-QSAR Methodology. J. Chem. Inf. Model., 2009, 49, 1428-1436. Datar, P.A.; Coutinho, E.C. A CoMFA study of COX-2 inhibitors with receptor based alignment. J. Mol. Graph. Model., 2004, 23, 239-251. Huang, H.Q.; Pan, X.L.; Tan, N.H.; Zeng, G.Z.; Ji, C.J. 3D-QSAR study of sulfonamide inhibitors of human carbonic anhydrase II. Eur. J. Med. Chem., 2007, 42, 365-372. Ma, X.; Zhou, L.; Zuo, Z.L.; Liu, J.; Yang, M.; Wang, R.W. Molecular docking and 3-D QSAR studies of substituted 2,2-bisaryl-bicycloheptanes as human 5-Lipoxygenase-Activating Protein (FLAP) inhibitors. Qsar. Comb. Sci., 2008, 27, 1083-1091. Palomer, A.; Pascual, J.; Cabre, F.; Garcia, M.L.; Mauleon, D. Derivation of pharmacophore and CoMFA models for leukotriene D-4 receptor antagonists of the quinolinyl(bridged) aryl series. J. Med. Chem., 2000, 43, 392-400. Long, W.; Liu, P.X.; Li, Q.; Xu, Y.; Gao, J. 3D-QSAR studies on a class of IKK-2 inhibitors with GALAHAD used to develop molecular alignment models. Qsar. Comb. Sci., 2008, 27, 1113-1119. Chen, Y.D.; Li, F.F.; Tang, W.Q.; Zhu, C.C.; Jiang, Y.J.; Zou, J.W.; Yu, Q.S.; You, Q.D. 3D-QSAR studies of HDACs inhibitors using pharmacophore-based alignment. Eur. J. Med. Chem., 2009, 44, 2868-2876. Narkhede, S.S.; Degani, M.S. Pharmacophore refinement and 3D-QSAR studies of histamine H-3 antagonists. Qsar. Comb. Sci., 2007, 26, 744-753. Dove, S.; Buschauer, A. Improved alignment by weighted field fit in CoMFA of histamine H-2 receptor agonists imidazolylpropylguanidines. Current Medicinal Chemistry, 2011 Vol. 18, No. 1 [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] 7 Quant. Struct-Act. Rel., 1999, 18, 329-341. Tervo, A.J.; Nyronen, T.H.; Ronkko, T.; Poso, A. Comparing the quality and predictiveness between 3D QSAR models obtained from manual and automated alignment. J. Chem. Inf. Comput. Sci., 2004, 44, 807-816. Cramer, R.D. Topomer CoMFA: A design methodology for rapid lead optimization. J. Med. Chem., 2003, 46, 374-388. Chung, J.Y.; Pasha, F.A.; Chung, H.; Yang, B.S.; Lee, C.; Oh, J.S.; Moon, M.W.; Cho, S.J.; Cho, A.E. Topomer-CoMFA study of tricyclic azepine derivatives-EGFR inhibitors. Mol. Cell. Toxicol., 2008, 4, 78-84. Chuman, H.; Karasawa, M.; Fujita, T. A novel three-dimensional QSAR procedure: Voronoi field analysis. Quant. Struct-Act. Rel., 1998, 17, 313326. Timofei, S.; Kurunczi, L.; Schmidt, W.; Simon, Z. Steric and electrostatic effects in dye-cellulose interactions by the MTD and CoMFA approaches. Sar. Qsar. Environ. Res., 2002, 13, 219-226. Böhm, H.-J.; Klebe, G.; Kubiny, H. Wirkstoffdesign. Spektrum Akademischer: Heidelberg, 1996. Kim, K.H.M., Y.C. Direct prediction of dissociation constants (pKa's) of clonidine-like imidazolines, 2-substituted imidazoles and 1-methyl-2substituted-imidazoles from 3D structures using a comparative molecular field analysis (CoMFA) approach. J. Med. Chem., 1991, 34, 2056-2060. Klebe, G.A., U. On the prediction of binding properties of drug molecules by comparative molecular field analysis. J. Med. Chem., 1993, 36, 70-80. Puri, S.; Chickos, J.S.; Welsh, W.J. Three-dimensional quantitative structureproperty relationship (3D-QSPR) models for prediction of thermodynamic properties of polychlorinated biphenyls (PCBs): Enthalpy of sublimation. J. Chem. Inf. Comput. Sci., 2002, 42, 109-116. Puri, S.; Chickos, J.S.; Welsh, W.J. Three-dimensional Quantitative Structure-Property Relationship (3D-QSPR) models for prediction of thermodynamic properties of polychlorinated biphenyls (PCBs): Enthalpy of vaporization. J. Chem. Inf. Comput. Sci., 2002, 42, 299-304. Hirons, L.; Holliday, J.D.; Jelfs, S.P.; Willett, P.; Gedeck, P. Use of the Rgroup descriptor for alignment-free QSAR. Qsar. Comb. Sci., 2005, 24, 611619. Luo, H.B.; Cheng, Y.K. Quantitative structure-retention relationship of nucleic-acid bases revisited. CoMFA on purine RPLC retention. Qsar. Comb. Sci., 2005, 24, 968-975. Jayatilleke, P.R.N.; Nair, A.C.; Zauhar, R.; Welsh, W.J. Computational studies on HIV-1 protease inhibitors: Influence of calculated inhibitorenzyme binding affinities on the statistical quality of 3D-QSAR CoMFA models. J. Med. Chem., 2000, 43, 4446-4451. Mittal, R.R.; Harris, L.; McKinnon, R.A.; Sorich, M.J. Partial Charge Calculation Method Affects CoMFA QSAR Prediction Accuracy. J. Chem. Inf. Model., 2009, 49, 704-709. Tsai, K.C.; Chen, Y.C.; Hsiao, N.W.; Wang, C.L.; Lin, C.L.; Lee, Y.C.; Li, M.; Wang, B. A comparison of different electrostatic potentials on prediction accuracy in CoMFA and CoMSIA studies. Eur. J. Med. Chem., 2010, 45, 1544-1551. Jung, M.; Kim, H. CoMFA of artemisinin derivatives: Effect of location and size of lattice. Bioorg. Med. Chem. Lett., 2001, 11, 2041-2044. Kunick, C.; Lauenroth, K.; Wieking, K.; Xie, X.; Schultz, C.; Gussio, R.; Zaharevitz, D.; Leost, M.; Meijer, L.; Weber, A.; Jorgensen, F.S.; Lemcke, T. Evaluation and comparison of 3D-QSAR CoMSIA models for CDK1, CDK5, and GSK-3 inhibition by paullones. J. Med. Chem., 2004, 47, 22-36. Bucholtz, E.C.; Tropsha, A. The effect of region size on CoMFA analyses. Med. Chem. Res., 1999, 9, 675-685. Carpy, A. Importance of lipophilicity in molecular design - Foreword. Analusis, 1999, 27, 3-6. Vajragupta, O.; Boonchoong, P.; Wongkrajang, Y. Comparative quantitative structure-activity study of radical scavengers. Bioorg. Med. Chem., 2000, 8, 2617-2628. Nakagawa, Y.; Takahashi, K.; Kishikawa, H.; Ogura, T.; Minakuchi, C.; Miyagawa, H. Classical and three-dimensional QSAR for the inhibition of [H-3]ponasterone A binding by diacylhydrazine-type ecdysone agonists to insect Sf-9 cells. Bioorg. Med., 2005, 13, 1333-1340. Pajeva, I.; Wiese, M. Molecular modeling of phenothiazines and related drugs as multidrug resistance modifiers: A comparative molecular field analysis study. J. Med. Chem.,1998, 41, 1815-1826. Pajeva, I.K.; Wiese, M. Interpretation of CoMFA results - A probe set study using hydrophobic fields. Quant. Struct-Act. Rel., 1999, 18, 369-379. Bursi, R.; Sawa, M.; Hiramatsu, Y.; Kondo, H. A three-dimensional quantitative structure-activity relationship study of heparin-binding epidermal growth factor shedding inhibitors using comparative molecular field analysis. J. Med. Chem., 2002, 45, 781-788. Bursi, R.; Grootenhuis, A.; van der Louw, J.; Verhagen, J.; de Gooyer, M.; Jacobs, P.; Leysen, D. Structure-activity relationship study of human liver microsomes-catalyzed hydrolysis rate of ester prodrugs of MENT by comparative molecular field analysis (CoMFA). Steroids, 2003, 68, 213-220. Gu, C.M.; Hou, T.J.; Xu, X.J. Comparative molecular field analysis of gamma-hydroxy butenolide endothelin antagonists. Chem. J. Chinese U., 2001, 22, 1864-1868. Pan, Y.M.; Ji, M.J.; Ye, X.Q.; Kuang, P.X. 3D-QSAR analyses of novel benzofuranyl and benzothiophenyl biphenyls as PTP1B inhibitors. Chinese J. Org. Chem., 2003, 23, 167-172. Tuppurainen, K. Frontier orbital energies, hydrophobicity and steric factors 8 Current Medicinal Chemistry, 2011 Vol. 18, No. 1 [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] as physical QSAR descriptors of molecular mutagenicity. A review with a case study: MX compounds. Chemosphere, 1999, 38, 3015-3030. Chen, H.F.; Yao, X.J.; Petitjean, M.; Xia, H.O.; Yao, J.H.; Panaye, A.; Doucet, J.P.; Fan, B.T. Insight into the bioactivity and metabolism of human glucagon receptor antagonists from 3D-QSAR analyses. Qsar. Comb. Sci., 2004, 23, 603-620. Christensen, H.S.; Boye, S.V.; Thinggaard, J.; Sinning, S.; Wiborg, O.; Schiott, B.; Bols, M. QSAR studies and pharmacophore identification for arylsubstituted cycloalkenecarboxylic acid methyl esters with affinity for the human dopamine transporter. Bioorg. Med. Chem., 2007, 15, 5262-5274. Chen, H.F. Computational study of histamine H-3-receptor antagonist with support vector machines and three dimension quantitative structure activity relationship methods. Ana. Chim. Acta., 2008, 624, 203-209. Wolohan, P.; Reichert, D.E. Use of binding energy in comparative molecular field analysis of isoform selective estrogen receptor ligands. J. Mol. Graph. Model., 2004, 23, 23-38. Maitarad, P.; Saparpakorn, P.; Hannongbua, S.; Kamchonwongpaisan, S.; Tarnchompoo, B.; Yuthavong, Y. Particular interaction between pyrimethamine derivatives and quadruple mutant type dihydrofolate reductase of Plasmodium falciparum: CoMFA and quantum chemical calculations studies. J.Enzyme Inhibi. Med. Chem., 2009, 24, 471-479. Kubinyi, H. QSAR : Hansch analysis and related approaches. VCH: Weinheim, 1993. Kubinyi, H. 3D QSAR in Drug Design. Theory, Methods and Applications. Kubinyi, Hugo ed. ESCOM: Leiden, 1993. van de Waterbeemd, H. Chemometric Methods in Molecular Design. VCH: Weinheim, 1995. van de Waterbeemd, H. Advanced Computer-Assisted Techniques in Drug Discovery. VCH: Weinheim, 1995. Pearson, K. On lines and planes of closest fit to systems of points in space. Zhang et al. [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] Philos.Mag., 1901, 6, 559-576. Bush, B.L.; Nachbar, R.B. Sample-distance Partial Least Squares: PLS optimized for many variables, with application to CoMFA. J. Comput. Aided Mol. Des., 1993, 7, 587-619. Clark, R.D. Boosted leave-many-out cross-validation: the effect of training and test set diversity on PLS statistics. J. Comput. Aided Mol. Des., 2003, 17, 265-275. Polanski, J.; Gieleciak, R.; Bak, A. Probability issues in molecular design: Predictive and modeling ability in 3D-QSAR schemes. Comb. Chem. High Throughput Screen., 2004, 7, 793-807. Jojart, B.; Marki, A. Receptor-based QSAR studies of non-peptide human oxytocin receptor antagonists. J. Mol. Graph. Model., 2007, 25, 711-720. Maw, H.H.; Hall, L.H. E-state modeling of corticosteroids binding affinity validation of model for small data set. J. Chem. Inf. Comput. Sci., 2001, 41, 1248-1254. Puntambekar, D.S.; Giridhar, R.; Yadav, M.R. Understanding the antitumor activity of novel tricyclicpiperazinyl derivatives as farnesyltransferase inhibitors using CoMFA and CoMSIA. Eur. J. Med. Chem., 2006, 41, 12791292. Nair, P.C.; Sobhia, M.E. CoMFA based de novo design of pyridazine analogs as PTP1B inhibitors. J. Mol. Graph. Model., 2007, 26, 117-123. Ramar, S.; Bag, S.; Tawari, N.R.; Degani, M.S. 3-D-QSAR analysis of 2(oxalylamino) benzoic acid class of protein tyrosine phosphatase 1B inhibitors by CoMFA and Cerius2.GA. Qsar. Combi. Sci., 2007, 26, 608617. Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graph. Model., 2002, 20, 269-276. Golbraikh, A.; Shen, M.; Xiao, Z.; Xiao, Y.D.; Lee, K.H.; Tropsha, A. Rational selection of training and test sets for the development of validated QSAR models. J. Comput. Aided. Mol. Des., 2003, 17, 241-253.