Skip to content

Repository of codes and data for Estrogen Receptor Alpha QSAR modeling

Notifications You must be signed in to change notification settings

AzraelXu/estrogen-receptor-alpha-qsar

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Probing the origin of estrogen receptor alpha inhibition via large-scale QSAR study

This repository is comprised of the following folders and files that constitutes the entire workflow used in this study for the construction of QSAR models for predicting the pIC50 value of estrogen receptor inhibition.

Files

File names Description
01_ER_alpha_preparation.ipynb Retrieves bioactivity data from ChEMBL database, curates and pre-process the data
02_ER_alpha_RO5.ipynb Performs Lipinski's rule-of-five analysis
03_Fingerprint_gen.ipynb Calculates fingerprint descriptors
04_Regression.ipynb Constructs the initial QSAR models via random forest to obtain the set of top 20 descriptors
05_Regression_select_importance.ipynb Constructs QSAR models using the top 20 descriptors
06_ER_alpha_preparation-test.ipynb Prepares the input CSV file of the external set (< and > symbols in the bioactivity label)
07_External_test.ipynb Applies the constructed QSAR model on the external set from 06_ER_alpha_preparation-test.ipynb.
08_Applicability_domain.ipynb Performs applicability domain analysis via PCA bounding box approach
environment.yml The conda environment that allows the replication of the Python environment (specific versions of installed packages) used in this study

Folders

Folder names Description
applicability_domain Contain CSV files and output PDF files generated via 07_Applicability_domain.ipynb
Fingerprint Contain CSV files of fingerprint descriptors calculated by the PaDEL software
model Contain CSV files of bioactivity data obtained programmatically from the ChEMBL database
PaDEL-Descriptor Contain PaDEL JAR file along with fingerprint XML files
QSAR Contain CSV files of fingerprint descriptors along with bioactivity data of all compounds used for QSAR model building
QSAR_select Contain CSV files of the top 20 descriptors (from feature selection) used for building the final QSAR model
Result Contain all results data
second_external_set Contain XLSX files of the second external set where bioactivity label contains the < and > symbols
smiles Contain SMILES data of all compounds used in this project
SubFiles contain raw data files used in constructing plots
Train_Fp_normalized contain fingerprint descriptors after normalized process

Citing this work

If you use these codes and data, please cite the following paper:

Citing us
Suvannang N, Preeyanon L, Malik AA, Schaduangrat N, Shoombuatong W, Worachartcheewan A, Tantimongcolwat T, Nantasenamat C. Probing the origin of estrogen receptor alpha inhibition via large-scale QSAR study. RSC Advances 8 (2018) 11344-11356.

About

Repository of codes and data for Estrogen Receptor Alpha QSAR modeling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%