Problem with the pretrained model on autoencoding #78

jerryhluo · 2018-04-06T22:37:51Z

Dear author,

I download your codes and pre-trained model (model_500k.h5) and tried out the following commands:
python preprocess.py data/smiles_500k.h5 data/processed_500k.h5
python sample.py data/processed_500k.h5 data/model_500k.h5 --target autoencoder

Then it outputs:

NC(=O)c1nc(cnc1N)c2ccc(Cl)c(c2)S(=O)(=O)Nc3cccc(Cl)c3
(-> encoder -> decoder ->)
7-7ASC-F@@7N7AAAAAAAAAAAAAlllllNAACAAC7lll7AlllAAACC%CLA-VVVVVVVVFF--lAAAAAAAAAAAAAAVVAAAAACCAACCAAACAAACCA77A-VVV--

I am not sure what happened to the pre-trained model, seems it does not do a good job at all... Do you see a similar problem or I did something wrong...?

jerryhluo · 2018-04-09T21:07:49Z

Found a part of the reason:
Python 2 generates "charset" variable in the same order (A->Z), while Python 3 is completely random.
See https://stackoverflow.com/questions/9792664/set-changes-element-order

In additional, charset for 500k SMILES varies at different runs (due to the sampling function in preprocess.py). It's important for users to keep using the same set of files.

May you please provide the charset used to generate the pre-trained model? Since the model dimension also depends on the charset...
@maxhodak

Updated on 4/23/2018
Solution found at https://github.com/chembl/autoencoder_ipython

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with the pretrained model on autoencoding #78

Problem with the pretrained model on autoencoding #78

jerryhluo commented Apr 6, 2018

jerryhluo commented Apr 9, 2018 •

edited

Loading

Problem with the pretrained model on autoencoding #78

Problem with the pretrained model on autoencoding #78

Comments

jerryhluo commented Apr 6, 2018

jerryhluo commented Apr 9, 2018 • edited Loading

jerryhluo commented Apr 9, 2018 •

edited

Loading