Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code improvement for DrugMechDB #223

Open
eKathleenCarter opened this issue May 2, 2024 · 0 comments · May be fixed by #264
Open

code improvement for DrugMechDB #223

eKathleenCarter opened this issue May 2, 2024 · 0 comments · May be fixed by #264
Assignees

Comments

@eKathleenCarter
Copy link
Contributor

          [drop_duplicates](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html)

is more efficient for lines 197 and 220

I would suggest replacing lines 198 through 216 with the following:

df.rename(columns={"dmdb_ids": "drugmechdb_path_id",
"qualified_predicates": QUALIFIED_PREDICATE,
"object_direction_qualifiers": OBJECT_DIRECTION_QUALIFIER,
"object_aspect_qualifiers": OBJECT_ASPECT_QUALIFIER},
inplace=True)
df[KNOWLEDGE_LEVEL] = KNOWLEDGE_ASSERTION
df[AGENT_TYPE] = MANUAL_AGENT

df['edge_props'] = df.apply(lambda x: x[QUALIFIED_PREDICATE,
OBJECT_DIRECTION_QUALIFIER,
OBJECT_ASPECT_QUALIFIER,
KNOWLEDGE_LEVEL,
AGENT_TYPE].dropna().to_dict(), axis=1)

for i, row in df.iterrows():

output_edge = kgxedge(
    subject_id=row["source_ids"],
    object_id=row["target_ids"],
    predicate=row["predicates"],
    edgeprops=row['edge_props'],
    primary_knowledge_source=self.provenance_id
)
self.output_file_writer.write_kgx_edge(output_edge)

Because iterrows is EXTREMELY slow and inefficient

Originally posted by @eKathleenCarter in #221 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants