Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add synthesis dataset #18

Open
qai222 opened this issue Sep 22, 2022 · 2 comments
Open

Add synthesis dataset #18

qai222 opened this issue Sep 22, 2022 · 2 comments

Comments

@qai222
Copy link
Collaborator

qai222 commented Sep 22, 2022

Hey, I'm working on adding a set of perovskite synthesis reactions to src/olympus/datasets as a benchmark dataset (following a previous discussion with @jschrier earlier this month). A brief description of this dataset can be found here.
Questions:

  1. It appears all of the benchmark datasets currently in src/olympus/datasets have continuous targets, but our synthesis dataset has a categorical target. Is this a problem?
  2. What is the best practice to include descriptors for categorical parameters? (I see you have a descriptor.csv in this folder, but would it be better to just include the descriptors directly like this?)
@qai222 qai222 assigned rileyhickman and unassigned rileyhickman Sep 29, 2022
@qai222
Copy link
Collaborator Author

qai222 commented Sep 29, 2022

Any update on this @rileyhickman (apologize if this is not the right person....)?

@rileyhickman
Copy link
Contributor

Hi @qai222,

  1. Thanks for pointing this out. We actually don't currently support categorical targets, but this could be something that we implement in Olympus moving forward. I'd be happy to discuss this further with you such that we can include your synthesis dataset.
  2. Conventionally, we include descriptors using the following convention: a csv file whose columns (from left to right) correspond to the categorical parameter name (e.g., organic), the option name (e.g., ethylammonium), the descriptor name (e.g., homo), and the descriptor value (e.g, -0.46). I realize this format is perhaps slightly unconventional, but I've found it streamlines organizing the Dataset instance. Would be happy to discuss potential ways of further improving this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants