Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with the pretrained model on autoencoding #78

Open
jerryhluo opened this issue Apr 6, 2018 · 1 comment
Open

Problem with the pretrained model on autoencoding #78

jerryhluo opened this issue Apr 6, 2018 · 1 comment

Comments

@jerryhluo
Copy link

Dear author,

I download your codes and pre-trained model (model_500k.h5) and tried out the following commands:
python preprocess.py data/smiles_500k.h5 data/processed_500k.h5
python sample.py data/processed_500k.h5 data/model_500k.h5 --target autoencoder

Then it outputs:

NC(=O)c1nc(cnc1N)c2ccc(Cl)c(c2)S(=O)(=O)Nc3cccc(Cl)c3
(-> encoder -> decoder ->)
7-7ASC-F@@7N7AAAAAAAAAAAAAlllllNAACAAC7lll7AlllAAACC%CLA-VVVVVVVVFF--lAAAAAAAAAAAAAAVVAAAAACCAACCAAACAAACCA77A-VVV--

I am not sure what happened to the pre-trained model, seems it does not do a good job at all... Do you see a similar problem or I did something wrong...?

@jerryhluo
Copy link
Author

jerryhluo commented Apr 9, 2018

Found a part of the reason:
Python 2 generates "charset" variable in the same order (A->Z), while Python 3 is completely random.
See https://stackoverflow.com/questions/9792664/set-changes-element-order

In additional, charset for 500k SMILES varies at different runs (due to the sampling function in preprocess.py). It's important for users to keep using the same set of files.

May you please provide the charset used to generate the pre-trained model? Since the model dimension also depends on the charset...
@maxhodak


Updated on 4/23/2018
Solution found at https://github.com/chembl/autoencoder_ipython

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant