THÀNH TÍCH NGHIÊN CỨU - Quỹ Học Bổng RVN

Transcription

THÀNH TÍCH NGHIÊN CỨU - Quỹ Học Bổng RVN
THÀNH TÍCH NGHIÊN CỨU
Sinh viên: Đỗ Trọng Nhất – Ngày sinh: 22/12/1990 – MSSV: D08-140
Lớp Dược chính quy niên khóa 2009-2014
Khoa Dược – Đại học Y Dược TPHCM
 Công trình nghiên cứu công bố quốc tế và khu vực
- “Design, Synthesis and Biological Evaluation of some Chalcone Derivatives as Potential
Pancreatic Lipase Inhibitors”
được đăng trên “The 17th International Electronic Conference on Synthetic Organic
Chemistry 2013” tại đường dẫn http://www.sciforum.net/conference/ecsoc-17/paper/2299
(có đính kèm abstract và thư xác nhận của giảng viên hướng dẫn)
- “In silico modeling for antimalarial compounds.”, báo cáo poster tại Hội nghị Quốc tế
PharmaIndochina năm 2013 và đăng trong kỷ yếu của “Proceedings of The Eighth Indochina
Conference on Pharmaceutical Sciences, 2013, trang 503-509.”
(có đính kèm bài toàn văn và thư xác nhận của giảng viên hướng dẫn)
 Công trình nghiên cứu cấp thành phố
- “ ghiên c u khả năng g n kết gi a enzym histon deacetylase 2 và nhóm dẫn ch t
hydroxamat và mercaptoacetamid.” đăng trên tạp chí h c Thành phố H
hí inh năm
2014 tập 18 phụ bản của Số 2, trang 317-323.
(có đính kèm abstract và thư xác nhận của giảng viên hướng dẫn)
 Công trình nghiên cứu tham dự cuộc thi “Sinh viên nghiên cứu khoa học” năm 2014
của trường Đại h c Dược TPH
vòng chung kết hiện đang chờ kết quả.
- “ ghiên c u xây dựng mô hình docking và 2D-QSAR trên các dẫn ch t c chế enzym
telomerase ng dụng trong điều trị ung thư”
(có đính kèm abstract và thư xác nhận của giảng viên hướng dẫn)
Design, Synthesis and Biological Evaluation of some Chalcone
Derivatives as Potential Pancreatic Lipase Inhibitors
Hoai-Anh Nguyen, Trong-Nhat Do, Van-Dat Truong, Khac-Minh Thai, Ngoc-Chau Tran, Thanh-Dao Tran
Faculty of Pharmacy, Ho Chi Minh City University of Medicine and Pharmacy 41
Dinh Tien Hoang Street, District 1, Ho Chi Minh City, Vietnam
ABSTRACT
Obesity is a growing global health problem, but few drugs are available for the treatment of
obesity. Several classes of compounds have been studied and demonstrated for the human lipase
inhibition activity - a target in obesity prevention. This study was about design, synthesis and
biological evaluation of some synthetic chalcones as pancreatic lipase inhibitors.
FlexX software integrated in LeadIT was used for molecular docking studies of 66 chalcone
derivatives. 6 derivatives with low docking scores (good binding affinity) were selected for
synthesis using both classical and microwave-assisted Claisen-Schmidt condensation reactions.
Biological evaluation on pancreatic lipase indicated that some chalcones showed good lipase
inhibition activities and that there were correlations between in silico model and biological
activities. These presented the possibility to apply virtual screening tools in finding potential
agents with high obesity-prevention capacity.
Keywords: lipase inhibitory activity, chalcone
NGHIÊN CỨU XÂY DỰNG MÔ HÌNH DOCKING VÀ 2D-QSAR
TRÊN CÁC DẪN CHẤT ỨC CHẾ ENZYM TELOMERASE
ỨNG DỤNG TRONG ĐIỀU TRỊ UNG THƯ
Đỗ Trọng Nhất, Đồng Quốc Hiệp, Phan Cường Huy,
Nguyễn Thị Thanh Lan, Nguyễn Đức Khánh Thơ
Thầy hướng dẫn: PGS.TS Thái Khắc Minh
Từ khóa
Telomerase, oxadiazol, pyrazol, flavonoid, docking, 2D-QSAR, ung thư
Đặt vấn đề
Theo thống kê của tổ chức y tế thế giới (WHO), chỉ tính riêng trong năm 2012 có khoảng
8,2 triệu người trên toàn cầu đã tử vong vì bệnh ung thư. Telomerase hiện đang là một đích
tác động đầy tiềm năng của các thuốc chống ung thư thế hệ mới và các dẫn chất oxadiazol,
pyrazol và flavonoid. Nghiên cứu này sử dụng mô hình docking để khảo sát khả năng gắn
kết của những nhóm cấu trúc này với telomorase, đồng thời xây dựng mô hình QSAR từ cơ
sở dữ liệu thu thập được. Từ đó tiến hành sàng lọc các thuốc sẵn có trên thị trường; định
hướng thiết kế tìm ra những chất mới có khả năng ức chế telomerase, mở ra hy vọng mới
cho bệnh nhân ung thư.
Đối tượng và phương pháp nghiên cứu
Phần mềm FlexX tích hợp trong LeadIT được sử dụng để nghiên cứu mô hình mô tả phân
tử docking của 110 dẫn chất thuộc 3 nhóm cấu trúc khác nhau, thu thập từ 9 bài báo khoa
học với giá trị sinh học IC50 ức chế hoạt tính của telomerase. Mô hình 2D-QSAR được xây
dựng dựa trên thuật toán bình phương tối thiểu từng phần PLS (MOE) trên cơ sở dữ liệu
gồm: 41 dẫn xuất của oxadiazol và 48 chất dẫn xuất của pyrazol.
Kết quả và bàn luận
Các acid amin đóng vai trò quan trọng trong việc gắn kết Lys189, Phe193 và Asp254. Mô
hình docking cho thấy Fla–3d và Pyr–p16a là các chất có điểm số docking tốt nhất, phù
hợp với giá trị hoạt tính thực nghiệm. Theo hướng thiết kế thuốc dựa vào Ligand xây dựng
2 mô hình QSAR, 1 mô hình cho nhóm cấu trúc oxadiazol với các thông số SMR_VSA3,
GCUT_SLOGP_1 và PEOE_VSA-5; mô hình còn lại cho nhóm cấu trúc pyrazol với các
thông số PEOE_VSA+1, b_1rotN và PEOE_VSA-1. Cả 2 mô hình đều có khả năng dự
đoán đúng với sai số so với giá trị thực nghiệm nhỏ hơn 0,5. Kết hợp QSAR với Docking
dự đoán một số cấu trúc có khả năng ức chế telomerase cao thuộc 2 nhóm cấu trúc trên.
Kết quả dự đoán của mô hình QSAR với các thuốc có sẵn trên thị trường có cơ chế khác
với ức chế telomerase cho thấy nhiều thuốc có khả năng tác động trên enzym đích này.
Kết luận
Mô hình docking xây dựng dựa trên Protein telomerase (ID : 3DU6) có chất lượng tốt và
độ tin cậy cao. Hoạt tính ức chế telomerase có thể là do khả năng gắn kết với khe kị nước
phía trên vị trí gắn kết của ATP với enzym. Khả năng dự đoán của mô hình docking và mô
hình 2D–QSAR trong nghiên cứu có sự tương quan với nhau.
ZYM HISTON
HYDROXAMIC
MERCAPTOACETAMID
Thái Khắc Minh*,
*,
*, Hứa Ng c Minh Tuyền*,
Sơ **
TÓM T T
Enzym histon deacetylase (HDAC) được xe l
phát triển các thuốc
u
t ư
t
e ca t aceta d v HDAC được t
cứu
d c
t
d c
c
được s s
để t
HDAC2.
tt
ục tiêu nghiên cứu
a
d
c t hydroxamic v
ư
t
t
a vị trí g n k t chọn lọc cho enzym
ối ượ v p ươ p áp
iê cứu: Các d n ch t hydroxamic v
định IC50 trên hoạt tính ức ch HDAC được t
cứu t
tt
t
t ể HDAC ( d
A )
ead
e ca t aceta d đ x c
tv c ut c
Kết quả và bàn luận:
tc
t u c t y c c ac d a
ua t ọng trong g n k t c a
d c t
cứu t
HDAC l
e
e
ly
Hs
Hs
As
As
Hs
A
ly
Ala
C cc tc
t v HDAC tốt
HDAC
Trên HDAC2, ph n dị vòng là benzo[d]thiazol tỏ ra g n k t hiệu qu v i hệ thống liên k t π-π v i các
acid amin Phe155, Phe210.
Kết luận: Trong nghiên cứu này mô hình mô t phân t docking được ti n hành trên c u trúc tinh
thể chụp b ng tia X c a enzym HDAC2 (3MAX) và các d n ch t hydroxamic v
e ca t aceta d ô
hình mô t phân t docking này có thể ứng dụng nh m thi t k ra các ch t có kh n ng ức ch mạnh và
chuyên biệt HDAC2 nh m mục tiêu tìm ra các hoạt ch t có kh n ng ứng dụng trong đi u trị ung thư.
HDAC, docking, hydroxamic
e ca t aceta
d u
t ư t
t
t uốc
ABSTRACT
MOLECULAR INTERACTION BETWEEN HISTONE DEACETYLASE AND HYDROXAMIC,
MERCAPTOACETAMIDE DERIVATIVES
Khac-Minh Thai, Nhan-Tam Nguyen-Huu, Trong-Nhat Do, Minh-Tuyen Hua-Ngoc, Cao-Son Doan
* Y Hoc TP. Ho Chi Minh * Vol. 18 - Supplement of No 2 - 2014:317-323
Introduction: Histone deacetylase (HDAC) enzyme has recently been considered as one of the target
for anticancer drug development. In this study, the molecular docking model of hydroxamic derivatives on
HDAC2 was analysed to figure out the different binding regions of the enzyme. The results could give
insight the interactions of HDACs and hydroxamic derivatives at the molecular level and helpful for
design new selective HDAC inhibitors.
Material and methods: The hydroxamic and mercaptoacetamide derivatives (with HDAC2 IC50
values) were used to dock into X-ray crystal structure of HDAC2 (pdb 3MAX) with LeadIT software.
Results and discussion: The important amino acids in the binding site of HDAC2 were indicated
via docking results including Phe210, Phe155, Gly154, His146, His183, Asp269, Asp181, His145,
Arg39, Gly305, and Trp140. Our results also indicated that these compounds could interact with
HDAC2 better than HDAC8. On HDAC2, the heterocyclic benzo[d]thiazole shows the effective binding
with target v a π- π
te act
wt
e
a d e
Conclusion: This study described the molecular docking on X-ray crystal structure of HDAC2
(3MAX) and hydroxamic, mercaptoacetamide derivatives. Our results could be applied to design the
active, effective and selective inhibitors on HDAC2 which is useful in cancer treatment.
Keywords: HDAC, docking, hydroxamic, mercaptoacetamide, cancer, drug design
*Bộ môn Hóa Dược – Khoa Dược – Đại học Y Dược TP. Hồ Chí Minh; ** Viện Kiểm nghiệm Thuốc TW
Tác gi liên hệ: TS. Thái Khắc Minh
: 0909. 680. 385
Email: [email protected]
IN SILICO MODELING FOR ANTIMALARIAL COMPOUNDS
Thanh-Tan Mai, Trong-Nhat Do, Quoc-Hiep Dong, Duc-Khanh-Tho Nguyen,
Thanh-Man Le, and Khac-Minh Thai
Department of Medicinal Chemistry, Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi
Minh City; 41 Dinh Tien Hoang Street, District 1, Ho Chi Minh City, Vietnam; Email: [email protected]
Abstract
Malaria and drug resistance of the parasite are the current serious problems. The drug
discovery proccess requires a lot of time and money. Pharmacy informatics can help in virtual
screening on the large number of chemical compounds quickly with cost saving, which opens
up the prospect of antimalarial drug development. Counter-propagation neural networks was
used in this study to build classification and regression models for predicting antimalarial
activity in silico. A total of 8 classification models and 2 regression models were built and
have good predicting ability for large chemical databases with diverse structural frames.
Counter-propagation neural networks has shown the good ability to build multilayer
classification models and nonlinear regression models.
Key words: Malaria; Counter-propagation neural networks; Classification; Regression.
1. Introduction
Malaria is one of most dangerous epidemic diseases in the developing countries. According
the Malaria Report 2012 of World Health Organization (WHO), there are 219 cases of malaria
with 660,000 deaths.[1] Malaria is caused by five species of parasites of the genus Plasmodium
that affect humans (P. falciparum, P. vivax, P. ovale, P. malariae and P. knowlesi). Malaria due
to P. falciparum (Pf) is the most deadly form and it predominates in Africa and Southeast
Asia. International disbursements to malaria-endemic countries increased every year from less
than US$ 100 million in 2000 to US$ 1.84 billion in 2012. Global resource requirements for
malaria control were estimated in the 2008 Global Malaria Action Plan to exceed US$ 5.1
billion per year between 2011 and 2020. In addition, while our current tools remain
remarkably effective in most settings, resistance to artemisinins – the key compounds in
artemisinin-based combination therapies – has been detected in four countries of South-East
Asia.[1] Therefore, a new drug which is effective, safety and have activity against resistant
parasite strains is an imperative demand.
The drug discovery proccess requires 10-15 years and costs a lot of money and effort. With
the support of pharmacoinformatics, the time and expenses for this process will be saved. In
addition, the determination of in vitro activities of potential antimalarial compounds is good
condition for building QSAR models for predicting in silico antimalarial activity. Thus, this
study was conducted with the objective of building classification and regression models for
predicting antimalarial activity of chemical compounds by counter-propagation neural
networks (CPG-NN).
2. Materials and Methods
Dataset
The in vitro antimalarial activity (IC50) of 1,126 structurally diverse compounds were
collected from the literature. In particular, 585 compounds have activites against chloroquin
(CQ) sensitive Pf strains and 705 compounds have activities against CQ resistant Pf strains.
For classification of antimalarial compounds based on CPG-NN models, the compounds with
activity higher than CQ (class 1) are assigned to 1 value and compounds with activity lower
than CQ (class 2) are assigned to 0 value. After the training, ouput is a real number between
0.0 and 1.0. For the final classification, the response weight values were transformed to
discriminative values (0 or 1), applying a threshold value of 0.5 for each class. For regression
models, activity value of a compound is pIC50 (equal to – log(IC50)).
1
Molecular descriptors and feature selection
A wide range of 184 different 2D descriptors was calculated for all compounds using the
descriptor tool in MOE [4]. The 2D descriptors are defined as numerical properties which can
be calculated from the connection table representation of a molecule. They include physical
properties, subdivided surface areas, atom counts and bond counts, Kier&Hall connectivity
and Kappa shape indices, adjacency and distance matrix descriptors, pharmacophore feature
descriptors, and partial charge descriptors. In addition, a number of 2489 2D descriptors was
also calculated by DRAGON [6]. They include topological descriptors, walk and path counts,
connectivity indices, information indices, 2D auto correlations, edge adjacency indicies,
burden eigenvalues, topological charge indicies, eigenvalue-based indicies, 2d binary
fingerprints, 2D frequency fingerprint. To select an optimum set of molecular descriptors,
QuaSAR-Contingency tool in MOE and Select attribute tool in WEKA [7] were applied to
prune the set of the large of number molecular descriptors.
Counter-propagation neural networks
CPG-NN is a method for supervised learning which has a two-layer architecture and can be
used for prediction of pIC50 values. The CPG-NN involves a Kohonen neural network as an
input layer and an output layer related to the properties for the object. During training of a
CPG-NN, the winning neuron is determined exclusively on basis of the input values, which is
similar as in a regular Kohonen network. Additionally, in a CPG-NN each neuron in the
Kohonen layer has one or several corresponding neurons in the output layer. Normally, a
CPG-NN trained for predicting antimalarial affinity would have an output layer containing
one dimension with the antimalarial pIC50 value. [3]
All CPG-NN studies described were carried out with the software package SONNIA [5]. The
network topology used in this study was a toroidal one with a width equal to the height thus
resulting in square maps. The CPG-NN network size was depending on the number of
compounds in the training set and comprised N neurons with N equal to the number of
compounds in the training set. Other parameters was kept to the default of SONNIA with:
Epochs = 100; Interval = 1; Span(x) = Span(y) = N / 2 , Step(x) = Step(y) = Span/Epochs,
Rate = 0.5, Rate Factor = 0.995.
Evaluation criteria for classification models
Accuracy
Accuracy is the fraction of observations correctly predicted. The performance of the
classification models was measured as total accuracy on all compounds and accuracy values
for each class. Let NH, NL, and NM represent the number of compounds belonging to the
high, low, and middle hERG activitiy class, respectively, and N is the total number of
compounds.
Let TP and TN is the number of compounds having activity higher and lower than CQ
correctly labeled by the CPG-NN model. The number of false positives in the higher than CQ
class is named FP whereas FN accounts for false assignments to the lower than CQ class.
Accuracy values were calculated as follows: (i) overall accuracy on all compounds, total
accuracy = (TP + TN)/(TP + FP + TN + FN); (ii) accuracy on higher than CQ acivity class,
TP/(TP + FN); (iii) accuracy on lower than CQ activity class, TN/(TN + FP). [2]
Precision
The precision represents the probability that a compound in a given class is predicted
correctly, i.e., the fraction of true positives among all cases predicted as being positive.
Precision values were calculated as follows: (iv) precision on higher than CQ acivity class,
TP/(TP + FP); (v) precision on lower than CQ activity class, TN/(TN + FN). [2]
GH score
The 'Goodness of Hit lists' or GH score was applied to measure the overall quality of
classification results. The GH score on each class takes into account both the precision (the
fraction of correct predictions within a class) and the percentage of this class that is retrieved
2
from the dataset. Those models where the GH scores of all three classes are close to 1 (the
maximum possible value) will be considered as being the best one. The GH scores for each
class of antimalarial compounds are defined as follows: (vi) GH score for higher-than-CQ
activity class, TP(2TP + FN + FP)/[(TP + FN)(TP + FP)]; (vii) GH score for lower-than-CQ
activity class, TN(2TN + FN + FP)/[(TN + FP)(TN + FN)]. [2] [3]
3. Results and Discussion
Classification models for predicting antimalarial activity
A classification model for predicting activity against Pf in generally was built by CPG-NN. In
addition, there are 3 classification models for predicting activity against CQ sensitive Pf
strains and 3 classification models for CQ resistant Pf strains.
Classification model for predicting activities against Pf
This model was named CPG-C PF and built with 7 molecular desciptors and the dataset
includes 487 compounds (341 in class 1 and 146 in class 2). The CPG network size was equal
to 19  19 neurons and the number of training cycles was set to 100. Using the training and
test sets obtained by diverse splitting and five-times random splitting (80:20), the results show
equally high values for accuracy, precision and GH-score for both the training and the test set
in two splitting methods (Table 1). The CPG-NN average map obtained for the diverse
training set is presented in Figure 1.
Figure 1: Output CPG-NN average maps for classification model for activities against Pf (19
 19) with training set is the diverse subset. Blue: 0; Orange: 1; White: empty; Black: conflict.
With diverse splitting, the total accuracy values obtained for the training and test set are 0.99
and 0.98, respectively. In each class, accuracies archived range from 0.98 to 0.96 for the
training set and from 0.80 to 0.94 for the test set. In addition, the GH scores obtained for the
training set were 0.99 for class 1 and 0.98 for class 2. For the test set, the GH scores were 0.99
and 0.96 for class 1 and class 2, respectively.
For the training and test sets obtained by five-times random splitting (80/20), the total
accuracies obtained showed an average value of 0.99 for the training set of 390 compounds
and of 0.96 for the test set containing of 98 compounds (Table 1). The compounds in class 1
reach a value of 0.99 both for accracy and GH score for the training set and an accuracy value
of 0.98 and a GH score of 0.97 for the test set. The compounds class 2 reach a value of 0.98
both for accracy and GH score for the training set and an accuracy value of 0.91 and a GH
score of 0.94 for the test set. Thus, CPG-PF seem to be good classify compounds in both two
classes.
3
Table 1: Summary of antimalarial activity against Pf classification powers by CPG-NN
Diversea
Randomb
CPG-C PF
YModel
Train
Test
Train
Test
scramblingc
b_single, PEOE_VSA+0, PEOE_VSA+2,
Descriptors
PEOE_VSA+5, opr_nring, SMR_VSA4, logP(o/w)
N
390
98
390
390
98
Total accuracy
0,99
0,98
0,56
0,99
0,96
Accuracy
0,99
0,97
0,68
0,99
0,98
Higher
Precision
0,99
1,00
0,67
0,99
0,96
than CQ
GH
0,99
0,99
0,68
0,99
0,97
Accuracy
0,98
1,00
0,30
0,98
0,91
Lower
0,98
0,92
0,30
0,98
0,97
than CQ Precision
GH
0,98
0,96
0,30
0,98
0,94
a
training set is the diverse subset; bFive-fold-leave-20%-out; cY-scrambling (50 times) of diverse training set.
(+) Compounds with antimalarial activity higher than CQ; (-) Compounds with activity lower than CQ.
Classification models for activities against CQ sensitive and resistant Pf strains
For classifying activity against CQ sensitive Pf strains, 3 models named CPG-C S1, S2 and
S3 were built with 5 MOE descriptors, 10 DRAGON descriptors and 15 descriptors combined
from CPG-C S1 and S2 model, respectively. The dataset includes 585 compounds. CPG
network size was equal to 21  21 neurons and the epochs was set to 100. Using the diverse
subset as the training set and the remaining compounds as test set, the results show equally
high values for accuracy, precision and GH-score for both the training and the test set (Table
2). The CPG-NN average map obtained for the training set of CPG-C S1, S2 and S3 are
presented in Figure 2. Among them, CPG-C S3 model is the best model with total accuracy
value of 0.94 for training set and 0.85 for test set.
Figure 2: Output CPG-NN average maps for CPG-C S1, S2 and S3 models, respectively, for
activities against CQ sensitive Pf strains (21  21) with training set is the diverse subset. Blue:
0; Orange: 1; White: empty; Black: conflict.
4
Table 2: Summary of antimalarial activity against CQ sensitive Pf strains classification
powers by CPG-NN
CPG-C S1
CPG-C S2
CPG-C S3
YYYModels
Traina
Test
scram Traina Test scram Traina Test scram
bling
bling
bling
N
468
117
468
468
117
468
468
117
468
nR07, nArNHO,
GCUT_SLOGP_3,
T(Br..Br), T(F..Cl),
BCUT_PEOE_0,
Combined from CPGDescriptors
nROR, B04[N-F],
b_1rotN, a_nN,
C S1 and S2 model
B07[N-N], B08[N-F],
SlogP_VSA0
H-048, EEig04r
Total accuracy
0,95
0,84
0,70
0,92
0,84
0,71
0,94
0,85
0,69
Accuracy
0,86
0,76
0,20
0,71
0,61
0,17
0,80
0,70
0,25
(+) Precision
0,84
0,74
0,22
0,87
0,68
0,21
0,85
0,74
0,18
GH
0,85
0,75
0,21
0,79
0,64
0,19
0,83
0,72
0,22
Accuracy
0,97
0,87
0,83
0,97
0,91
0,84
0,97
0,90
0,80
(-) Precision
0,97
0,88
0,81
0,93
0,88
0,81
0,96
0,88
0,82
GH
0,97
0,88
0,82
0,95
0,90
0,83
0,96
0,89
0,81
a
training set is the diverse subset.
(+) Compounds with antimalarial activity higher than CQ; (-) Compounds with activity lower than CQ.
Moreover, three models namely CPG-C R1, R2 and R3 for classification activity against CQ
resistant Pf strains were built by using 7 MOE descriptors, 11 DRAGON descriptors and 18
descriptors combined from CPG-C R1 and R2 model, respectively. The dataset was included
of 701 compounds. The diverse subset as the training set and the remaining compounds as test
set was used and CPG network size equal to 23  23 and epochs was set to 100 (Table 3). The
CPG-NN average map obtained for the training set of CPG-C R1, R2 and R3 are presented in
Figure 3. Similar to the models for compounds with activity against CQ sensitive Pf strains,
CPG-C R3 is the best model with total accuracy value of 0.94 for the training set and 0.91 for
the test set.
Figure 3: Output CPG-NN average maps for CPG-C R1, R2 and R3 models, respectively, for
activities against CQ resistant Pf strains (21  21) with training set is the diverse subset. Blue:
0; Orange: 1; White: empty; Black: conflict.
5
Table 3: Summary of antimalarial activity against CQ resistant Pf strains classification
powers by CPG-NN
CPG-C R1
CPG-C R2
CPG-C R3
YYYModels
Traina
Test
scram Traina Test scram Traina Test scram
bling
bling
bling
N
561
140
561
561
140
561
561
140
561
GCUT_PEOE_3,
C-012, nR10, nRNHO,
BCUT_PEOE_0,
nArC=N, MSD,
Combined from CPGDescriptors
PEOE_VSA-0,
GATS1m, AAC, TI2,
C R1 and R2 model
PEOE_VSA+2, SMR_
B05[N-Cl], RBN,
VSA2, vsa_acc, chiral_u
EEig02x
Total accuracy
0,93
0,85
0,54
0,95
0,89
0,54
0,94
0,91
0,54
Accuracy
0,89
0,69
0,38
0,92
0,81
0,37
0,91
0,83
0,35
(+) Precision
0,91
0,93
0,39
0,95
0,93
0,38
0,93
0,98
0,38
GH
0,90
0,81
0,38
0,93
0,87
0,38
0,92
0,91
0,36
Accuracy
0,95
0,96
0,64
0,97
0,95
0,64
0,96
0,99
0,66
(-)
Precision
0,94
0,81
0,64
0,95
0,86
0,63
0,95
0,85
0,63
GH
0,94
0,89
0,64
0,96
0,90
0,64
0,96
0,92
0,64
a
training set is the diverse subset.
(+) Compounds with antimalarial activity higher than CQ; (-) Compounds with activity lower than CQ.
All six classification models showed high values for accuracy, precision and GH-score.
However, CPG-C S1, S2 and S3 model either showed unequally predicting ability for class 1
and class 2 with the GH-score for the training set in class 1 in each models archived range
from 0.79 to 0.85 and from 0.95 to 0.96 in class 2. Whereas, CPG-C R1, R2 and R3 model
showed more equally predicting ability for both two class with the GH-score for the training
set in each models archived range from 0.90 to 0.93 in class 1 and from 0.94 to 0.96 in class
2. This can be explained by the homogeneity of compounds number in two property class in
the dataset of compounds having activity on CQ resistant Pf strains.
Regression model for activities against CQ sensitive and resistant Pf strains
CPG-NN can also be used for regression analyses and prediction of pIC50 values. There are 2
regression models namely CPG-R1 and CPG-R2 were built for predicting IC50 values for
activity against CQ sensitive and resistant Pf strains, respectively.
CPG-R1 was built with the dataset of 572 compounds (CPG network size was set to 21  21)
and 7 MOE descriptors (logP(o/w), rings, vsa_acc, GCUT_SLOGP_0, BCUT_PEOE_1,
SMR_VSA3, lip_druglike). The training set based on diversity showed a good performance
(Table 4 and Figure 4) with an R2 = 0.88 (RMSE = 0.44), and also the test set performed quite
well (R2 = 0.70, RMSE = 0.61). In case of five-times random division, R2 = 0.88 and RMSE =
0.42 are obtained for the training set. However, the validation run on the test sets revealed a
rather low performance (R2 = 0.65, RMSE = 0.76).
Table 4: Summary of antimalarial activity regression powers by CPG-NN approach.
Diversea
Randomb
Models
Train
Test Y-scrambling
Train
Test
N
458
114
458
458
114
RMSE
0,44
0,61
1,74
0,42
0,76
CPG-R1
0,88
0,70
0,0032
0,88
0,65
R2
N
553
138
553
553
138
0,36
0,46
1,80
0,37
0,69
CPG-R2 RMSE
2
0,92
0,84
0,0005
0,93
0,76
R
a
training set is the diverse subset; bFive-fold-leave-20%-out.
6
CPG-R2 was built with the dataset of 691 compounds (network size was set to 23  23) and 8
MOE descriptors (PEOE_VSA+0, PEOE_VSA-0, SlogP_VSA0, SMR_VSA4, SlogP_VSA2,
vsa_other, rings, b_triple). This model showed the better performance than CPG-R1 with an
R2 = 0.92 (RMSE = 0.36) for training set in diverse division and R2 = 0.93 (RMSE = 0.37) for
training set in random division (Table 4 and Figure 5).
5
y = 0.9252x + 0.0383
4
R² = 0.9232
3
2
pIC50 predicted
pIC50 predicted
5
y = 0.8776x + 0.0567
4
R² = 0.8807
3
1
0
-3
-1 -1
1
3
y = 0.8731x + 0.0734
-2
R² = 0.7032
-3
2
1
0
-3
-1
2
-2
y = 0.8748x + 0.0492
R² = 0.8359
-3
pIC50 observed
pIC50 observed
Figure 4: Calculated versus observed Figure 5: Calculated versus observed
antimalarial pIC50 plots for the CPG-R1 antimalarial pIC50 plots for the CPG-R2
regression model for the diverse training set.
regression model for the diverse training set.
Conclusion
CPG-NN was used to build classification and regression models which have good predicting
ability for large chemical databases with diverse structural frames. The CPG-C PF model can
be applied for classification activity on Pf. CPG-C S3 and CPG-R3 are the best models for
classification activity on CQ sensitive and resistant Pf strains. Besides, two regression models
CPG-R1 and CPG-R2 are also good models for predicting IC50 value on Pf. These are useful
models for virtual screening the antimalarial activity on large library of chemical compounds.
Acknowledgement
This research is funded by the Department of Science and Technology, Ho Chi Minh City
under grant number VLM-139-15-ds2011 (to Khac-Minh Thai).
References
1.
K. Andrews, M. Aregawi, R. Cibulskis, M. Lynch, R. Newman, R. Williams. World
Malaria Report 2012. 2012; v-xiii.
2.
O. Carugo. Detailed estimation of bioinformatics prediction reliability through the
Fragmented Prediction Performance Plots. BMC Bioinformatics, 2007; 8(380.
3.
K.M. Thai, G.F. Ecker. Classification models for HERG inhibitors by counterpropagation neural networks. Chem Biol Drug Des, 2008; 72(4): 279-289.
4.
Chemical Computing Group, Molecular Operating Environment (MOE) 2008.10,
http://www.chemcomp.com, Access date: 30/06/2013.
5.
Molecular Networks, SONNIA 4.2, http://www.molecular-networks.com, Access date:
27/6/2013.
6.
Telete srl, DRAGON 5.5 (2007), http://www.telete.mi.it, Access date: 27/6/2013.
7.
Waikato
Environment
for
Knownledge
Analysis,
Weka
3.6,
http://www.cs.waikato.ac.nz/~ml/weka/, Access date: 26/6/2013.
7