-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle wes/wgs inheritance edge case #4440
Changes from 15 commits
babadd7
7115f81
fb1dc19
de664cd
8d9d25d
877abbc
fb88af9
773ba0c
18d1d63
4d58e03
e1edb07
d806332
3206941
ef5a4ec
2d5d07f
f82c0fe
0158391
0b542e1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
This folder comprises a Hail (www.hail.is) native Table or MatrixTable. | ||
Written with version 0.2.130-bea04d9c79b5 | ||
Created at 2024/10/02 14:46:35 | ||
Created at 2024/11/04 13:45:23 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,7 +13,10 @@ | |
FAMILY_2_MITO_SAMPLE_DATA, FAMILY_2_ALL_SAMPLE_DATA, MITO_VARIANT1, MITO_VARIANT2, MITO_VARIANT3, \ | ||
EXPECTED_SAMPLE_DATA_WITH_SEX, SV_WGS_SAMPLE_DATA_WITH_SEX, VARIANT_LOOKUP_VARIANT, \ | ||
MULTI_PROJECT_SAMPLE_TYPES_SAMPLE_DATA, FAMILY_2_BOTH_SAMPLE_TYPE_SAMPLE_DATA, \ | ||
VARIANT1_BOTH_SAMPLE_TYPES, VARIANT2_BOTH_SAMPLE_TYPES, FAMILY_2_BOTH_SAMPLE_TYPE_SAMPLE_DATA_MISSING_PARENTAL_WGS | ||
VARIANT1_BOTH_SAMPLE_TYPES, VARIANT2_BOTH_SAMPLE_TYPES, FAMILY_2_BOTH_SAMPLE_TYPE_SAMPLE_DATA_MISSING_PARENTAL_WGS, \ | ||
VARIANT3_BOTH_SAMPLE_TYPES, VARIANT4_BOTH_SAMPLE_TYPES, VARIANT2_BOTH_SAMPLE_TYPES_PROBAND_WGS_ONLY, \ | ||
VARIANT1_BOTH_SAMPLE_TYPES_PROBAND_WGS_ONLY, VARIANT3_BOTH_SAMPLE_TYPES_PROBAND_WGS_ONLY, \ | ||
VARIANT4_BOTH_SAMPLE_TYPES_PROBAND_WGS_ONLY | ||
from hail_search.web_app import init_web_app, sync_to_async_hail_query | ||
from hail_search.queries.base import BaseHailTableQuery | ||
|
||
|
@@ -365,34 +368,35 @@ async def test_both_sample_types_search(self): | |
MULTI_PROJECT_BOTH_SAMPLE_TYPE_VARIANTS, gene_counts=GENE_COUNTS, sample_data=MULTI_PROJECT_SAMPLE_TYPES_SAMPLE_DATA, | ||
) | ||
|
||
# Variant1 in family_2 is de novo in exome but maternally inherited in genome. | ||
# Genome passes quality and inheritance, show genotypes for both sample types. | ||
variant1_interval = ['1', 10438, 10440] | ||
# Variant 1 is de novo in exome but maternally inherited in genome. | ||
# Variant 2 is inherited in exome and de novo in genome. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would clarify in the comment that inherited means "inherited and homozygous" as usually maternally inherited means mom has one alt allele and proband inherited that and has 1 alt allele There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You updated the comment for variant 1 but not variant 2, de novo is usually a het call so its still confusing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you think it would be better if these variants are heterozygous and inherited? That seems like it's more representative of what we'd see in real searches. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think having the ability to have at least one variant passing for homozygous recessive search is valuable for testing and I like the coverage we get from this variant configuration. And we do sometimes see de novo homozygotes, we just don't refer to it as plain "de novo" which is why I think a comment update is sufficient |
||
# Variant 3 is inherited in both sample types. Variant 4 is de novo in both sample types. | ||
inheritance_mode = 'recessive' | ||
await self._assert_expected_search( | ||
[VARIANT1_BOTH_SAMPLE_TYPES], sample_data=FAMILY_2_BOTH_SAMPLE_TYPE_SAMPLE_DATA, inheritance_mode=inheritance_mode, | ||
**COMP_HET_ALL_PASS_FILTERS, intervals=[variant1_interval] | ||
[VARIANT1_BOTH_SAMPLE_TYPES, VARIANT2_BOTH_SAMPLE_TYPES, [VARIANT3_BOTH_SAMPLE_TYPES, VARIANT4_BOTH_SAMPLE_TYPES]], | ||
sample_data=FAMILY_2_BOTH_SAMPLE_TYPE_SAMPLE_DATA, inheritance_mode=inheritance_mode, | ||
**COMP_HET_ALL_PASS_FILTERS | ||
) | ||
# Exome passes quality and inheritance, show genotypes for both sample types. | ||
inheritance_mode = 'de_novo' | ||
await self._assert_expected_search( | ||
[VARIANT1_BOTH_SAMPLE_TYPES], sample_data=FAMILY_2_BOTH_SAMPLE_TYPE_SAMPLE_DATA, inheritance_mode=inheritance_mode, | ||
intervals=[variant1_interval] | ||
[VARIANT1_BOTH_SAMPLE_TYPES, VARIANT2_BOTH_SAMPLE_TYPES, VARIANT4_BOTH_SAMPLE_TYPES], | ||
sample_data=FAMILY_2_BOTH_SAMPLE_TYPE_SAMPLE_DATA, inheritance_mode=inheritance_mode, | ||
**COMP_HET_ALL_PASS_FILTERS | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we shouldn't need |
||
) | ||
|
||
# Variant 2 in family_2 is inherited in exome and there is no parental data in genome. | ||
# Genome and exome pass quality and inheritance, show genotypes for both sample types. | ||
variant2_interval = ['1', 38724418, 38724420] | ||
# Same variants, but genome data is proband-only. | ||
inheritance_mode = 'recessive' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this test file would be more readable if instead of toggling the inheritance back and forth you run both recessive searches back to back and then both de novo searches, so its clearer that nothings changing other than the parental data There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this line is unneeded |
||
await self._assert_expected_search( | ||
[VARIANT2_BOTH_SAMPLE_TYPES], sample_data=FAMILY_2_BOTH_SAMPLE_TYPE_SAMPLE_DATA_MISSING_PARENTAL_WGS, | ||
inheritance_mode=inheritance_mode, **COMP_HET_ALL_PASS_FILTERS, intervals=[variant2_interval] | ||
[VARIANT1_BOTH_SAMPLE_TYPES_PROBAND_WGS_ONLY, VARIANT2_BOTH_SAMPLE_TYPES_PROBAND_WGS_ONLY, | ||
[VARIANT3_BOTH_SAMPLE_TYPES_PROBAND_WGS_ONLY, VARIANT4_BOTH_SAMPLE_TYPES_PROBAND_WGS_ONLY]], | ||
sample_data=FAMILY_2_BOTH_SAMPLE_TYPE_SAMPLE_DATA_MISSING_PARENTAL_WGS, inheritance_mode=inheritance_mode, | ||
**COMP_HET_ALL_PASS_FILTERS | ||
) | ||
# Genome passes quality and inheritance exome fails inheritance (parental data shows variant is inherited). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I still think a comment here explaining whats being tested is helpful. Maybe something like "Variant 2 fails inheritance when parental data is present" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated! |
||
inheritance_mode = 'de_novo' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this line is unneeded |
||
await self._assert_expected_search( | ||
[VARIANT2_BOTH_SAMPLE_TYPES], sample_data=FAMILY_2_BOTH_SAMPLE_TYPE_SAMPLE_DATA_MISSING_PARENTAL_WGS, | ||
inheritance_mode=inheritance_mode, intervals=[variant2_interval] | ||
[VARIANT1_BOTH_SAMPLE_TYPES_PROBAND_WGS_ONLY, VARIANT4_BOTH_SAMPLE_TYPES_PROBAND_WGS_ONLY], | ||
sample_data=FAMILY_2_BOTH_SAMPLE_TYPE_SAMPLE_DATA_MISSING_PARENTAL_WGS, inheritance_mode=inheritance_mode, | ||
**COMP_HET_ALL_PASS_FILTERS | ||
) | ||
|
||
async def test_inheritance_filter(self): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting for posterity that I'm not 100% sure that this is the final logic we want, because it would allow us to return variants that pass quality and fail inheritance in one sample type and fail quality and pass inheritance in the other, meaning theres no sample type that clearly passes both inheritance and quality. However, I think we maybe do want to return these, the logic gets kind of confusing and I can't quite be sure these would not be helpful. I think being overly permissive here is better, if the analysts are seeing a bunch of cases where they ultimately think that the returned variants are not helpful and should be filtered out we can always go back later and make this a stricter criteria, so we should leave this as is for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, we may want to consider quality and inheritance passing together and handle it differently (the
&
seems like it could be too simple) but it's not clear to me what the logic/change would be. I agree that trying this out and getting feedback from analysts before we do that is the way to go.