-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcolumn_documentation.tsv
We can make this file beautiful and searchable if this error is corrected: Illegal quoting in line 85.
302 lines (302 loc) · 35.1 KB
/
column_documentation.tsv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
table column description
ext_ This prefix designates a table that's a roughly 1:1 copy of an external dataset
sequencing_batch_status finalized_status Indicates whether a batch is stable and ready for release (true) or in-progress and should not be released (false).
automation_state This table is used by the automated pipeline to store states (of its sub-programs).
automation_state program_name The name of the a program that is part of the pipeline.
automation_state state The state of the program as text. The format
bag_meldeformular "Meldeformular data provided by the BAG. See the BAG codebook, which is sometimes updated. Contact: TS or SN or CC."
bag_meldeformular iso_country_exp "Exposure country coded with ISO 3166-1 alpha-3, three-letter country code standardized to only those in 'country' table."
bag_dashboard_meldeformular "Meldeformular data provided by the BAG, it is updated daily. This table is tailored for the timeseries dashboard. Column definitions can be found in the Codebook COVID-19 for SSPH+."
bag_test_numbers "Data provided by the BAG, it is updated daily."
bag_test_numbers date The date on which the tests were taken.
bag_test_numbers positive_tests The number of positive tests
bag_test_numbers negative_tests The number of negative tests
bag_test_numbers canton Canton code. It is only provided since 23.05.2020
bag_test_numbers age_group "The age group, for example, ""0 - 9"". It is only provided since 23.05.2020."
consensus_sequence sample_name Name given to the sequenced material (assigned by the sequencing center).
consensus_sequence ethid ETH identifier for a sample.
consensus_sequence header Fasta header in the data source (covid19-pangolin/backup/working/samples/<sample_name>/<date_flowcell>/references/ref_ambig.fasta). Format should be <sample_name>-<date>_<flowcell>
consensus_sequence seq "Consensus sequence. Positions with < 5x coverage are ""N"". Minor bases with >= 5% frequency and present in >= 2 reads contribute to an IAUPC ambiguity code at a position. Lowercase letters indicate < 50x coverge. Sequences may include gaps, which are coded with ""-"". "
consensus_sequence coverage Mean coverage. From V-pipe output qa.csv.
consensus_sequence r1_basequal "FastQC per base quality flag for read 1. Based on quality scores. WARNING if the lower quartile for any base is less than 10, or if the median for any base is less than 25. FAIL if the lower quartile for any base is less than 5 or if the median for any base is less than 20. From V-pipe output qa.csv."
consensus_sequence r2_basequal "FastQC per base quality flag for read 2. Based on quality scores. WARNING if the lower quartile for any base is less than 10, or if the median for any base is less than 25. FAIL if the lower quartile for any base is less than 5 or if the median for any base is less than 20. From V-pipe output qa.csv."
consensus_sequence rejreads Percentage of reads rejected by Prinseq. From V-pipe output qa.csv.
consensus_sequence alnreads Percentage of kept reads that were aligned. From V-pipe output qa.csv.
consensus_sequence insertsize Number of bases covered by a read pair amplicon length. From V-pipe output qa.csv.
consensus_sequence consensus_n Number of bases with less than 5 reads coverage. From V-pipe output qa.csv.
consensus_sequence consensus_lcbases Number of bases with less than 50 reads coverage. From V-pipe output qa.csv.
consensus_sequence divergence "Number of sites where seq != reference genome MN908947 and seq != N and seq != ""-"". From Nextstrain diagnostic.py output."
consensus_sequence excess_divergence divergence - expectd_divergence where expected_divergence = (days between Dec 1 2019 and sample collection date) * 25/365. From Nextstrain diagnostic.py output.
consensus_sequence number_n "Number of bases coded ""N"". From Nextstrain diagnostic.py output."
consensus_sequence number_gaps Number of gaps compared to reference genome MN908947. From Nextstrain diagnostic.py output.
consensus_sequence clusters Regions with SNPs clustered in close genomic proximity. From Nextstrain diagnostic.py output.
consensus_sequence gaps "Gaps compared to reference genome MN908947. From Nextstrain diagnostic.py output. These ranges are 0-indexed and start-exclusive, end-inclusive."
consensus_sequence all_snps Genome positions of SNPs compared to reference genome MN908947. From Nextstrain diagnostic.py output.
consensus_sequence flagging_reason Sequence quality issue flagged by Nextstrain diagnostic.py.
consensus_sequence fail_reason "Reason the sample fails quality control. ""no fail reason"" passes QC. Null values should correspond to rows that are not true samples. This column is generated by a script written by cEvo and the values should be treated with a grain of salt because the QC criteria we use to submit to GISAID has changed several times."
consensus_sequence sequencing_center "Where the sample was sequenced. fgcz = Functional Genomics Center Zurich, gfb = Genomics Facility Basel, h2030 = Health 2030."
consensus_sequence sequencing_batch <Date of the sequencing run in YYYYMMDD format>_<flowcell name>.
consensus_sequence comment Any notes about why the sample is special or additional information.
consensus_sequence variant_of_concern "The name of a variant of concern, or just <null>."
consensus_sequence is_random Whether the sample was randomly selected or part of a targeted investigation.
consensus_sequence seq_unaligned "Unaligned sequence as a character string. Column introduced June 2021 to handle sequences with insertions, currently column is only use on a case-by-case basis (e.g. for comparing pacbio results to illumina)"
consensus_sequence dont_release Boolean specifying whether (TRUE) a sequence should be excluded from release.
consensus_sequence_mutation_nucleotide sample_name The sample name as in consensus_sequence.
consensus_sequence_mutation_nucleotide position The position in the sequence as aligned as usual.
consensus_sequence_mutation_nucleotide mutation The mutated base codon in the sequence. It has to be upper-case!
consensus_sequence_nextclade_mutation_aa The amino acid mutations that Nextclade detected.
consensus_sequence_nextclade_mutation_aa sample_name The sample name as in consensus_sequence.
consensus_sequence_nextclade_mutation_aa aa_mutation "The mutation in the Nextclade format. Examples: ""S:N123Y"", ""ORF8:Y45-"""
consensus_sequence_nextclade_data Results from Nextclade without the mutations
country A list of countries
country_old A incomplete (!) list of countries used by the timeseries dashboard
dashboard_state Stores general state information of the timeseries dashboard. It should contain exactly one row.
dashboard_state last_data_update The date on which the bag_dashboard_meldeformular table was last updated.
gisaid_sequence The GISAID nextfasta and nextmeta dataset.
gisaid_sequence date "The parsed date. If the dataset only provides a year or a month, it is undefined which exact day will be set."
gisaid_sequence date_str "The date in the same format as it was provided in the dataset. E.g., in some cases, only the year is provided."
gisaid_sequence original_seq The sequence from the dataset.
gisaid_sequence aligned_seq The sequence after alignment with our reference using mafft.
gisaid_sequence_mutation_nucleotide strain The strain as in gisaid_sequence.
gisaid_sequence_mutation_nucleotide position The position in the sequence as aligned as usual.
gisaid_sequence_mutation_nucleotide mutation The mutated base codon in the sequence. It has to be upper-case!
swiss_canton Official canton codes and names in various languages according to p. 81 in https://www.bk.admin.ch/dam/bk/en/dokumente/sprachdienste/English%20Style%20Guide.pdf.download.pdf/english_style_guide.pdf
ext_swiss_demographic "Demographic balance by age and canton (px-x-0102020000_104), provided by the Swiss Federal Statistical Office, https://www.pxweb.bfs.admin.ch/pxweb/en/px-x-0102020000_104/px-x-0102020000_104/px-x-0102020000_104.px, contains data between 2010 and 2019."
variant_mutation The mutations that characterize a variant.
variant_mutation variant_name The name of a variant
variant_mutation aa_mutation "The mutation in the Nextclade format. Examples: ""S:N123Y"", ""ORF8:Y45-"""
viollier_plate Viollier's plates containing both substance of positive and negative tests
viollier_plate viollier_plate_name Plate names set by Viollier. They have usually the format <day><month><year>eg<number> or <day><month><year>wuhan<number>. The name is stored as lower case.
viollier_plate gfb_number "The name a plate gets when it is shipped to the Genomics Facility Basel. As of 11.06.2021 full plates no longer get renamed, so a ""-"" in this column after this date just indicates a plate was sent to GFB. Another change was implemented 22.06.21 where we switch from writing into this column to sequencing_center. This column will not be updated thereafter."
viollier_plate fgcz_name "The name a plate gets when it is shipped to the Functional Genomics Center Zurich. As of 11.06.2021 full plates no longer get renamed, so a ""-"" in this column after this date just indicates a plate was sent to FGCZ. Another change was implemented 22.06.21 where we switch from writing into this column to sequencing_center. This column will not be updated thereafter."
viollier_plate sequencing_center "Where the plate was sent for sequencing, or whether it stayed at Viollier for in-house sequencing."
viollier_plate left_viollier_date The date when a plate was shipped to a genomic facility. It was introduced in October and is not set for earlier entries.
viollier_plate has_no_extract The plate was run on a BioRad CFX machine without producing a highly purified RNA extract.
viollier_plate comment Additional information as free text.
viollier_test Information about both positive and negative tests at Viollier. It should contain all positive tests but not not all negative tests.
viollier_test sample_number "A unique number for every test, it is defined by the diagnostic lab."
viollier_test ethid A unique number defined by us. It is defined for all samples that we sequenced and more.
viollier_test order_date The order date
viollier_test zip_code The zip code of the office of the doctor.
viollier_test city The city of the office of the doctor.
viollier_test canton The canton of the office of the doctor.
viollier_test pcr_code A technical code set by Viollier. 4 corresponds to a positive test.
viollier_test is_positive Whether a test is positive it should be true if and only if pcr_code is 4.
viollier_test purpose The sampling strategy/purpose: allowed values are currently ""surveillance"" and ""diagnostic""
viollier_test sequenced_by_viollier Whether the plate was sequenced by Viollier; Viollier started sequencing in week 15/16 of 2021.
viollier_test comment Additional information as free text.
viollier_test__viollier_plate Connects the viollier_test and viollier_plate table
viollier_test__viollier_plate sample_number As defined in viollier_test
viollier_test__viollier_plate viollier_plate_name As defined in viollier_plate
viollier_test__viollier_plate well_position The position of the sample on the plate. It is in upper-case.
viollier_test__viollier_plate e_gene_ct "As of 11.06.2021 Viollier only provides a single CT value, which is written into this column. "
viollier_test__viollier_plate rdrp_gene_ct CT value of the RdRp gene
viollier_test__viollier_plate seq_request Boolean specifying whether (TRUE) a sample was included on the list of samples we sent to the sequencing centers to sequence. This column is filled in from queries in script sql/viollier_test.sql and has only been filled in for samples sent from viollier beginning on 2021-04-19.
bag_sequence_report auftraggeber_nummer "A unique number for every test, it is defined by the BAG. Called the 'sample_number' in other tables. Blank for non-Viollier samples unless we are able to get this number from the data source (e.g. hospital)."
bag_sequence_report alt_seq_id Any other identifier we have for a sequence that could help the BAG identify it.
bag_sequence_report viro_purpose "The reason for sampling. ""outbreak"" (e.g. outbreak investigation, or re-sampling of a possible re-infection case); ""travel_case"" (e.g. sequences from people recently arrived from the U.K.; ""surveillance"" (random sampling of regular laboratory positive samples); ""screening"" (random sampling o asymptomatic individuals, e.g. of Army recruits)"
bag_sequence_report viro_source "The source of the sample. ""Swiss Viollier Sequencing Consortium""; ""Armee""; hospital or laboratory information."
bag_sequence_report viro_seq "The group responsible for sequencing. ""ETHZ, D-BSSE"""
bag_sequence_report viro_characterised "How the sample was analysed. ""no"" (not analysed); ""mcPCR_501Y"", ""sangerS"", ""wgs"" (whole-genome sequencing)"
bag_sequence_report viro_gisaid_id GISAID identifier for the sequence. Blank for un-released (low quality) or not-yet -released samples.
bag_sequence_report viro_genbank_id Genbank identifier for the sequence. Blank for un-released (low quality) or not-yet -released samples.
bag_sequence_report viro_ref_sequence_id "Reference sequence against which mutations are called. ""MN908947.3"" (https://www.ncbi.nlm.nih.gov/nuccore/MN908947)"
bag_sequence_report viro_relevant_mutations_to_ref_seq "List of amino acid mutations relative to the reference sequence. For ETHZ, D-BSSE sequences these are generated with the nextclade tool. "
bag_sequence_report viro_label "Variant of concern label. ""B.1.1.7""; ""501Y.V2"""
non_viollier_test This table is used to store metadata for samples that don't come from Viollier.
ext_demography_age "The number of people per age group (5-years brackets) and country. The numbers are estimates made by the UN for the year 2020. The data file ""Population by Age Groups - Both Sexes"" was downloaded on 27.03.2021 from https://population.un.org/wpp/Download/Standard/Population/ / https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/EXCEL_FILES/1_Population/WPP2019_POP_F07_1_POPULATION_BY_AGE_BOTH_SEXES.xlsx."
ext_owid_global_cases iso_country ISO 3166-1 alpha-3 three-letter country codes standardized to only those in 'country' table
ext_owid_global_cases country Location as it appears in the source data.
ext_owid_global_cases date Date of observation.
ext_owid_global_cases new_cases_per_million "New confirmed cases of COVID-19 per 1,000,000 people from COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University."
ext_owid_global_cases new_deaths_per_million "New deaths attributed to COVID-19 per 1,000,000 people from COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University."
ext_owid_global_cases new_cases New confirmed cases of COVID-19 from COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.
ext_owid_global_cases new_deaths New deaths attributed to COVID-19 from COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.
ext_fso_tourist_accomodation "For more information on this table, see the saved query to the FSO database: https://www.pxweb.bfs.admin.ch/sq/57be922e-05fb-4cad-ab19-bd823c54fb1d"
ext_fso_tourist_accomodation iso_country "ISO 3166-1 alpha-3 three-letter country codes standardized to those in 'country' table. Unknown countries are coded as 'XXX', e.g. 'Other European countries' in data source"
ext_fso_tourist_accomodation country Origin country of travellers as it appears in the data source.
ext_fso_tourist_accomodation date_type Timespan of date. Monthly.
ext_fso_tourist_accomodation date "Date of observation. The data is monthly, so dates are the first of the month."
ext_fso_tourist_accomodation n_arrivals Number of travellers arrived.
ext_fso_cross_border_commuters "For more information on this table, see the saved query to the FSO database: https://www.pxweb.bfs.admin.ch/sq/1f6791fe-7f97-4466-a1e7-82314ede5277"
ext_fso_cross_border_commuters iso_code "ISO 3166-1 alpha-3 three-letter country codes standardized to those in 'country' table. Unknown countries are coded as 'XXX', e.g. Andere in data source"
ext_fso_cross_border_commuters country Origin country of workders as it appears in data source.
ext_fso_cross_border_commuters date_type Timespan of date. Quarterly.
ext_fso_cross_border_commuters wirtschaftsabteilung Employment sector (in German).
ext_fso_cross_border_commuters date "Date of observation. The data is quarterly, so dates are the first of the first month of the quarter."
ext_fso_cross_border_commuters n_permits Number of cross-border workers with G permit.
frameshift_deletion_diagnostic sample_name Name given to the sequenced material (assigned by the sequencing center).
frameshift_deletion_diagnostic start_position Start position of deletion with respect to reference genome MN908947.
frameshift_deletion_diagnostic indel_type Type of mutation detected (deletion/insertion/stopgain/stoploss)
frameshift_deletion_diagnostic length Length of deletion (in nucleotide units).
frameshift_deletion_diagnostic gene_region Gene in which the deletion is found. Lara says the nucleotide position to gene mapping is taken somehow from the visualization part of V-pipe: https://github.com/cbg-ethz/V-pipe/blob/caesar_div/references/gffs/Genes_NC_045512.2.GFF3
frameshift_deletion_diagnostic reads_all Total number of reads covering the first position of the deletion.
frameshift_deletion_diagnostic reads_fwd Total number of forward reads covering the deletion.
frameshift_deletion_diagnostic reads_rev Total number of reverse reads covering the deletion.
frameshift_deletion_diagnostic deletions Number of reads supporting the deletion.
frameshift_deletion_diagnostic freq_del Fraction of reads supporting the deletion.
frameshift_deletion_diagnostic freq_del_fwd Fraction of forward reads supporting the deletion.
frameshift_deletion_diagnostic freq_del_rev Fraction of reverse reads supporting the deletion.
frameshift_deletion_diagnostic deletions_fwd Number of forward reads supporting the deletion.
frameshift_deletion_diagnostic deletions_rev Number of reverse reads supporting the deletion.
frameshift_deletion_diagnostic insertions Number of reads supporting the insertion.
frameshift_deletion_diagnostic freq_insert Fraction of reads supporting the insertion.
frameshift_deletion_diagnostic freq_insert_fwd Fraction of forward reads supporting the insertion.
frameshift_deletion_diagnostic freq_insert_rev Fraction of reverse reads supporting the insertion.
frameshift_deletion_diagnostic insertions_fwd Number of forward reads supporting the insertion.
frameshift_deletion_diagnostic insertions_rev Number of reverse reads supporting the insertion
frameshift_deletion_diagnostic matches_ref Number of reads where the base matches the ref-base.
frameshift_deletion_diagnostic pos_critical_inserts Start positions (in reference genome coordinates) of insertions in the same gene_region that occur in > 40% of reads.
frameshift_deletion_diagnostic pos_critical_dels Start positions (in reference genome coordinates) of deletions in the same gene_region that occur in > 40% of reads.
frameshift_deletion_diagnostic homopolymeric "True if either around the start or end position of the deletion three bases are the same, which may have caused the polymerase to skip during reverse transcription of viral RNA to cDNA, e.g. AATAG."
frameshift_deletion_diagnostic ref_base Base in the reference genome.
frameshift_deletion_diagnostic indel_diagnosis Summary of support for / explanation of the frameshift insertion/deletion or the stopgain/stoploss event. This is reported to GISAID upon submission.
frameshift_deletion_diagnostic indel_position Long form summary of insertion/deletion/stopgain/stoploss position for report to GISAID.
frameshift_deletion_diagnostic stops Number of reads supporting the stop codon (count is done at the first position of the codon).
frameshift_deletion_diagnostic freq_stop Fraction of reads supporting the stop codon.
frameshift_deletion_diagnostic freq_stop_fwd Fraction of forward reads supporting the stop codon.
frameshift_deletion_diagnostic freq_stop_rev Fraction of reverse reads supporting the stop codon.
frameshift_deletion_diagnostic stops_fwd Number of forward reads supporting the stop codon.
frameshift_deletion_diagnostic stops_rev Number of reverse reads supporting the stop codon.
swiss_wastewater_plant Wastewater plant names.
foph_travel_quarantine Compilation of FOPH travel quarantine orders (created by hand on 10.03.2021). Sources are copies of the quarantine law from https://www.bag.admin.ch/bag/en/home/krankheiten/ausbrueche-epidemien-pandemien/aktuelle-ausbrueche-epidemien/novel-cov/empfehlungen-fuer-reisende/liste.html and https://www.fedlex.admin.ch/eli/cc/2020/496/en.
ext_country_coordinates "Coordinates of the centroids of various countries taken from the R package CoordinateCleaner's data, which they say they get from http://geo-locate.org. Since there are multiple entries per country, these are the average coordinates."
sequence_identifier ethid "This column is the key column: each ethid corresponds to a unique sample, and each sample should only be published once."
sequence_identifier gisaid_id "This column is filled in by the script import_gisaid_epi_isl.R, which relies on data downloads from GISAID to map between ethid an gisaid epi accession once a sequence has been accepted."
sequence_identifier sample_name This column is filled in by the script export_gisaid_submission.R when the GISAID submission is prepared. It only began being filled out mid-March or early-Apil 2021.
sequence_identifier gisaid_uploaded_at This column is filled in by the script export_gisaid_submission.R when the GISAID submission is prepared. It only began being filled out mid-Apil 2021.
sequence_identifier ena_id This column was filled in based on a semi-manual mapping of virus names between GISAID and ENA from SPSP in April 2021.
pangolin_lineage_alias "If a lineage with three numbers (e.g., B.1.617.2) is about to receive a sub-lineage, it gets an alias. For example, B.1.617.2 has the alias AY. B.1.617.2.1 is then AY.1. See https://github.com/cov-lineages/pango-designation/blob/master/alias_key.json for a list of the aliases. Recombinant lineages (which have more than one parental lineage) are excluded."
ext_problematic_site "The problematic (nucleotide) sites. Source: https://github.com/W-L/ProblematicSites_SARS-CoV2 (v5, commit 31ad9d4)"
lab_code_foph "This table was introduced 31.08.2021 after email ""FOPH - Data flow national genomic SARS-CoV-2 surveillance program"" from SPSP. It's used by the script R/export_spsp_submission.R"
lab_code_foph lab_code_foph A 5-digit ID code that FOPH assigend to each diagnostics lab for electronic reporting to MSys according to the FOPH.
lab_code_foph lab_name Name of the lab according to the FOPH-provided list.
lab_code_foph covv_orig_lab Name of the lab as we submit it to SPSP.
billing_report "This view was created using ""billing.sql"" by SN. It aims to report number of samples FROM VIOLLIER sent for sequencing, finished sequencing, and successfully submitted per sequencing center per week."
billing_report week_finished "Boolean. Starting with 2021 week 22, true if the number of sequence requests equals the number of samples for which we got back sequencing data."
billing_report samples_sent Number of positive samples with metadata in viollier_test that are on a plate recorded as having left viollier.
billing_report samples_seq_request Number of positive samples with metadata in viollier_test that are on a plate recorded as having left viollier AND have seq_request 'true' in viollier_test__viollier_plate.
billing_report sequencing_batches "All the distinct sequencing batches found among samples collected in this week, sent to this sequencing center, for which we received data back."
billing_report sequenced "Number of distinct samples (where a sample is a metadata entry in viollier_test) collected in this week, sequenced by this sequencing center, for which we received data back."
billing_report submittable Number of 'sequenced' samples where fail_reason in consensus_sequence is 'no_fail_reason'.
billing_report gisaid_submitted_but_not_confirmed Number of 'sequenced' samples that have been submitted (i.e. have an entry in sequence_identifier) but do not have a GISAID_EPI_ISL associated in the database yet.
billing_report gisaid_confirmed_upload Number of 'sequenced' samples that have been submitted (i.e. have an entry in sequence_identifier) and have a GISAID_EPI_ISL associated.
billing_report submitted_to_GISAID Total number of sequences submitted to GISAID; sum of 'gisaid_submitted_but_not_confirmed' and 'gisaid_confirmed_upload'.
billing_report submitted_to_GISAID_percentage Fraction of 'sequenced' samples submitted to GISAID; 'submitted_to_GISAID' divided by 'sequenced'.
billing_report_team_w "This view was created by SN. It aims to report number of samples FROM TEAM W sent for sequencing, finished sequencing, and successfully submitted per sequencing center per week."
billing_report_team_w week_finished "Boolean. Checks that the number of sequences received from a week is a multiple of 93 (the number of samples team w puts on a plate). As of Sept 2021, team w plates always contain samples from the same week."
billing_report_team_w samples_sent Number of entries in non_viollier_test from team w collected in this week.
billing_report_team_w sequencing_batches All the distinct sequencing batches found among samples collected in this week for which we received data back.
billing_report_team_w sequenced "Number of samples (where a sample is a metadata entry in non_viollier_test) collected in this week, sequenced by this sequencing center, for which we received data back."
billing_report_team_w submittable Number of 'sequenced' samples where fail_reason in consensus_sequence is 'no_fail_reason'.
billing_report_team_w gisaid_submitted_but_not_confirmed Number of 'sequenced' samples that have been submitted (i.e. have an entry in sequence_identifier) but do not have a GISAID_EPI_ISL associated in the database yet.
billing_report_team_w gisaid_confirmed_upload Number of 'sequenced' samples that have been submitted (i.e. have an entry in sequence_identifier) and have a GISAID_EPI_ISL associated.
billing_report_team_w submitted_to_GISAID Total number of sequences submitted to GISAID; sum of 'gisaid_submitted_but_not_confirmed' and 'gisaid_confirmed_upload'.
billing_report_team_w submitted_to_GISAID_percentage Fraction of 'sequenced' samples submitted to GISAID; 'submitted_to_GISAID' divided by 'sequenced'.
z_test_metadata Information about (positive and negative) COVID-19 PCR tests
z_test_metadata test_id A Swiss-wide unique identifier for a test. It has the format `<lab name>/<Laborauftragsnummer>`.
z_test_metadata ethid "A unique number defined by us. It has to be defined for all samples that we sequence. This number will be added to the ""strain"" name that we submit to public databases."
z_test_metadata order_date The date on which the lab received the order. We expect it to be the same as the sampling date.
z_test_metadata canton The canton of sampling or of the office of the doctor.
z_test_metadata zip_code The zip code of sampling or of the office of the doctor.
z_test_metadata city The city of sampling or of the office of the doctor.
z_test_metadata is_positive Whether a test is positive
z_test_metadata comment Additional information as free text
z_extraction_plate Information about plates produced by the originating labs containing RNA extracts and sent to the sequencing centers
z_extraction_plate extraction_plate_id A Swiss-wide unique identifier for an extraction plate. It has the format `<lab name>/<plate name given by the lab>`.
z_extraction_plate gfb_number "The name a plate gets when it is shipped to the Genomics Facility Basel. As of 11.06.2021 full plates no longer get renamed, so a ""-"" in this column after this date just indicates a plate was sent to GFB. Another change was implemented 22.06.21 where we switch from writing into this column to sequencing_center. This column will not be updated thereafter"
z_extraction_plate fgcz_name "The name a plate gets when it is shipped to the Functional Genomics Center Zurich. As of 11.06.2021 full plates no longer get renamed, so a ""-"" in this column after this date just indicates a plate was sent to FGCZ. Another change was implemented 22.06.21 where we switch from writing into this column to sequencing_center. This column will not be updated thereafter."
z_extraction_plate health2030 A boolean indicating whether a plate was shipped to Health 2030. Another change was implemented 22.06.21 where we switch from writing into this column to sequencing_center. This column will not be updated thereafter.
z_extraction_plate left_lab_or_received_metadata_date "Originally, it was the date when a plate was shipped to a genomic facility. It was introduced in October 2020 and is not set for earlier entries. Since June 2021, we don't necessarily know the exact date when a plate was shipped but rather insert the date when we received the metadata."
z_extraction_plate sequencing_center "Where the plate was sent for sequencing, or whether it stayed at Viollier for in-house sequencing."
z_extraction_plate viollier_extract_free "A boolean indicating whether this plate is from Viollier and ""extract free"". Extract free plates should contain the RNA material but are not purified. So far, we have not been able to generate sequences from those plates."
z_extraction_plate comment Additional information as free text
z_sequencing_plate Information about plates that goes into the sequencers. They may be equivalent to the extraction plates.
z_sequencing_plate sequencing_plate_id A unique identifier for a sequencing plate.
z_sequencing_plate sequencing_center The genomic facility that sequenced the plate
z_sequencing_plate sequencing_date The date of sequencing
z_sequencing_plate comment Additional information as free text
z_test_plate_mapping "A mapping between test_metadata, extraction_plate and sequencing_plate"
z_test_plate_mapping test_id As defined in test_metadata
z_test_plate_mapping old_sample "A sample that left Viollier before 07 June 2021 is considered as old. We have some wrong mappings in the old data: for example, we have multiple tests mapped to the same plate and well. For the new data (i.e., old_sample=false), we will prevent this by enforcing a uniqueness of plate-well combinations in this table."
z_test_plate_mapping extraction_plate As defined in extraction_plate
z_test_plate_mapping extraction_plate_well The well position on the extraction plate
z_test_plate_mapping extraction_e_gene_ct "As of 11.06.2021 Viollier only provides a single CT value, which is written into this column."
z_test_plate_mapping extraction_rdrp_gene_ct The RdRp gene CT
z_test_plate_mapping sequencing_plate As defined in sequencing_plate
z_test_plate_mapping sequencing_plate_well The well position on the sequencing plate
z_test_plate_mapping sample_type "The sample type. Valid values are ""clinical"", ""empty"", ""positive control"", ""negative control"", and ""wastewater""."
z_consensus_sequence The generated consensus sequences
z_consensus_sequence sample_name Name given to the sequenced material (ultimately assigned by the sequencing center)
z_consensus_sequence sequencing_plate As defined in sequencing_plate
z_consensus_sequence sequencing_plate_well The well position on the sequencing plate
z_consensus_sequence insert_date The date when we first insert the sequence
z_consensus_sequence update_date The date when we update the sequence
z_consensus_sequence sequencing_center The genomic facility that did the sequencing
z_consensus_sequence sequencing_batch The V-Pipe sequencing batch name
z_consensus_sequence seq_aligned The aligned version of the sequence
z_consensus_sequence seq_unaligned The unaligned version of the sequence
z_consensus_sequence ethid ETH identifier for a sample.
z_consensus_sequence_meta Information that are derived from or related to a consensus sequence. This includes QC values and lineage information.
z_consensus_sequence_meta sample_name As defined in consensus_sequence
z_consensus_sequence_meta references -
z_consensus_sequence_meta coverage_mean -
z_consensus_sequence_meta r1_basequal -
z_consensus_sequence_meta r2_basequal -
z_consensus_sequence_meta rejreads -
z_consensus_sequence_meta alnreads -
z_consensus_sequence_meta insertsize -
z_consensus_sequence_meta consensus_n -
z_consensus_sequence_meta qc_result "The summarized result. The column was formerly called ""fail_reason""."
z_consensus_sequence_meta diagnostic_divergence -
z_consensus_sequence_meta diagnostic_excess_divergence -
z_consensus_sequence_meta diagnostic_number_n -
z_consensus_sequence_meta diagnostic_number_gaps -
z_consensus_sequence_meta diagnostic_clusters -
z_consensus_sequence_meta diagnostic_gaps -
z_consensus_sequence_meta diagnostic_all_snps -
z_consensus_sequence_meta diagnostic_flagging_reason -
z_consensus_sequence_meta nextclade_clade -
z_consensus_sequence_meta nextclade_qc_overall_score -
z_consensus_sequence_meta nextclade_qc_overall_status -
z_consensus_sequence_meta nextclade_total_gaps -
z_consensus_sequence_meta nextclade_total_insertions -
z_consensus_sequence_meta nextclade_total_missing -
z_consensus_sequence_meta nextclade_total_mutations -
z_consensus_sequence_meta nextclade_total_non_acgtns -
z_consensus_sequence_meta nextclade_total_pcr_primer_changes -
z_consensus_sequence_meta nextclade_alignment_start -
z_consensus_sequence_meta nextclade_alignment_end -
z_consensus_sequence_meta nextclade_alignment_score -
z_consensus_sequence_meta nextclade_qc_missing_data_score -
z_consensus_sequence_meta nextclade_qc_missing_data_status -
z_consensus_sequence_meta nextclade_qc_missing_data_total -
z_consensus_sequence_meta nextclade_qc_mixed_sites_score -
z_consensus_sequence_meta nextclade_qc_mixed_sites_status -
z_consensus_sequence_meta nextclade_qc_mixed_sites_total -
z_consensus_sequence_meta nextclade_qc_private_mutations_cutoff -
z_consensus_sequence_meta nextclade_qc_private_mutations_excess -
z_consensus_sequence_meta nextclade_qc_private_mutations_score -
z_consensus_sequence_meta nextclade_qc_private_mutations_status -
z_consensus_sequence_meta nextclade_qc_private_mutations_total -
z_consensus_sequence_meta nextclade_qc_snp_clusters_clustered -
z_consensus_sequence_meta nextclade_qc_snp_clusters_score -
z_consensus_sequence_meta nextclade_qc_snp_clusters_status -
z_consensus_sequence_meta nextclade_qc_snp_clusters_total -
z_consensus_sequence_meta nextclade_errors -
z_consensus_sequence_meta pango_lineage -
z_consensus_sequence_meta pango_probability -
z_consensus_sequence_meta pango_learn_version -
z_consensus_sequence_meta pango_status -
z_consensus_sequence_meta pango_note -
z_consensus_sequence_mutation_aa The amino acid mutations of a sequence
z_consensus_sequence_mutation_aa sample_name As defined in consensus_sequence
z_consensus_sequence_mutation_aa aa_mutation An amino acid mutation
z_consensus_sequence_mutation_nucleotide The nucleotide mutations of a sequence
z_consensus_sequence_mutation_nucleotide sample_name As defined in consensus_sequence
z_consensus_sequence_mutation_nucleotide nuc_mutation A nucleotide mutation
z_consensus_sequence_notes Manually curated notes about sequences
z_consensus_sequence_notes sample_name As defined in consensus_sequence
z_consensus_sequence_notes release_decision "Can be used to manually override ""qc_result"""
z_consensus_sequence_notes purpose A specific reason for sequencing
z_consensus_sequence_notes comment Additional information as free text