-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add script to convert rdkit mols to networkx graphs with attributes #32
base: master
Are you sure you want to change the base?
Conversation
BTW, with this script I went on to print all the 8-membered SMILES strings in GDB-13 (8.smi) along with their graph6 representation. I sorted by graph6 representation, there are some graph6 strings that are associated with only 1 8-membered SMILES string in GDB-13, for example: Looking at 9-membered SMILES strings shows similar results. Part of this is due to GDB-13 being a selective enumeration. So, the counts for various graphs are affected by GDB-13's aggressive filters during the graph enumeration phase. |
|
I have several plans for the SMILES parser. It exists mainly beause I
couldn't find any other SMILES parser that represented the SMILES as an AST
(rdkit converts it to a molecule, which is technically an AST for
SMILES...) in Python that I could easily traverse.
Goals include:
1) generative production of valid SMILES strings, including modifying
existing molecules, extracting the graph structure
2) exploring alternatives to the simple single-letter charset (like 'Br'
and 'Cl'), as well as expressing the tree structure directly.
3) and using graph isomorphism tools to cluster similar structures.
…On Thu, Nov 24, 2016 at 11:24 PM, Max Hodak ***@***.***> wrote:
smilesparser.py looks pretty interesting. What direction are you thinking
of taking this?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#32 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHtyQPQisayEE7qJNfpNLSxuN3-1r4XKks5rBo1HgaJpZM4KwuQr>
.
|
max, etc:
I got approved to release the smilesparser as Google Open Source code with
Apache 2 license. Woot.
https://github.com/google/smilesparser
You can see some examples of recursively iterating over a SMILES string's
parsed structure:
…On Fri, Nov 25, 2016 at 7:51 AM, David Konerding ***@***.***> wrote:
I have several plans for the SMILES parser. It exists mainly beause I
couldn't find any other SMILES parser that represented the SMILES as an AST
(rdkit converts it to a molecule, which is technically an AST for
SMILES...) in Python that I could easily traverse.
Goals include:
1) generative production of valid SMILES strings, including modifying
existing molecules, extracting the graph structure
2) exploring alternatives to the simple single-letter charset (like 'Br'
and 'Cl'), as well as expressing the tree structure directly.
3) and using graph isomorphism tools to cluster similar structures.
On Thu, Nov 24, 2016 at 11:24 PM, Max Hodak ***@***.***>
wrote:
> smilesparser.py looks pretty interesting. What direction are you
> thinking of taking this?
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#32 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AHtyQPQisayEE7qJNfpNLSxuN3-1r4XKks5rBo1HgaJpZM4KwuQr>
> .
>
|
Sorry, the example for iteration is:
https://github.com/google/smilesparser/blob/master/test_smilesparser_object.py
That code is a sufficient example to see how to identify terminals, and it
should be pretty obvious how to parse out element names if you wanted to
use those to define the charset (although at this point I don't think it
really matters).
…On Thu, Dec 1, 2016 at 1:39 PM, David Konerding ***@***.***> wrote:
max, etc:
I got approved to release the smilesparser as Google Open Source code with
Apache 2 license. Woot.
https://github.com/google/smilesparser
You can see some examples of recursively iterating over a SMILES string's
parsed structure:
On Fri, Nov 25, 2016 at 7:51 AM, David Konerding ***@***.***>
wrote:
> I have several plans for the SMILES parser. It exists mainly beause I
> couldn't find any other SMILES parser that represented the SMILES as an AST
> (rdkit converts it to a molecule, which is technically an AST for
> SMILES...) in Python that I could easily traverse.
>
> Goals include:
> 1) generative production of valid SMILES strings, including modifying
> existing molecules, extracting the graph structure
> 2) exploring alternatives to the simple single-letter charset (like 'Br'
> and 'Cl'), as well as expressing the tree structure directly.
> 3) and using graph isomorphism tools to cluster similar structures.
>
> On Thu, Nov 24, 2016 at 11:24 PM, Max Hodak ***@***.***>
> wrote:
>
>> smilesparser.py looks pretty interesting. What direction are you
>> thinking of taking this?
>>
>> —
>> You are receiving this because you authored the thread.
>> Reply to this email directly, view it on GitHub
>> <#32 (comment)>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/AHtyQPQisayEE7qJNfpNLSxuN3-1r4XKks5rBo1HgaJpZM4KwuQr>
>> .
>>
>
>
|
@dakoner I'm a lurker on this repo, but very cool to see a good python SMILES parser. Do you think it would be tough to extend it to handle Reaction SMARTS? (for better support of chemical reactions) |
I assume it's straightforward.
This parser was constructed by taking an existing BNF grammar for SMILES
and manually translating it to pyparsing. There is a simple transformation
for most grammar to pyparsing- the only tricky parts involve recursively
defined elements (see the pp.Forward() lines in smilesparser.py). be sure
to call pyparsing.validate on grammars to check that you don't have
infinite recursion.
If there is a BNF for Reaction SMARTS (I couldn't find one) then you can
just translate it the same way I did. I couldn't find one. I'm sure you
could also write a parser from this page:
http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html
although it would take a bit more effort than translating a BNF.
…On Thu, Dec 1, 2016 at 3:09 PM, Bharath Ramsundar ***@***.***> wrote:
@dakoner <https://github.com/dakoner> I'm a lurker on this repo, but very
cool to see a good python SMILES parser. Do you think it would be tough to
extend it to handle Reaction SMARTS? (for better support of chemical
reactions)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#32 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHtyQE-nM-JE31eDe94zroRI-1w0O8Zyks5rD1OVgaJpZM4KwuQr>
.
|
It's not clear to me this code belongs here, I'm happy to make a new repo to hold this kind of code.