database/column_documentation.tsv

table	column	description
ext_		This prefix designates a table that's a roughly 1:1 copy of an external dataset
sequencing_batch_status	finalized_status	Indicates whether a batch is stable and ready for release (true) or in-progress and should not be released (false).
automation_state		This table is used by the automated pipeline to store states (of its sub-programs).
automation_state	program_name	The name of the a program that is part of the pipeline.
automation_state	state	The state of the program as text. The format
bag_meldeformular		"Meldeformular data provided by the BAG. See the BAG codebook, which is sometimes updated. Contact: TS or SN or CC."
bag_meldeformular	iso_country_exp	"Exposure country coded with ISO 3166-1 alpha-3, three-letter country code standardized to only those in 'country' table."
bag_dashboard_meldeformular		"Meldeformular data provided by the BAG, it is updated daily. This table is tailored for the timeseries dashboard. Column definitions can be found in the Codebook COVID-19 for SSPH+."
bag_test_numbers		"Data provided by the BAG, it is updated daily."
bag_test_numbers	date	The date on which the tests were taken.
bag_test_numbers	positive_tests	The number of positive tests
bag_test_numbers	negative_tests	The number of negative tests
bag_test_numbers	canton	Canton code. It is only provided since 23.05.2020
bag_test_numbers	age_group	"The age group, for example, ""0 - 9"". It is only provided since 23.05.2020."
consensus_sequence	sample_name	Name given to the sequenced material (assigned by the sequencing center).
consensus_sequence	ethid	ETH identifier for a sample.
consensus_sequence	header	Fasta header in the data source (covid19-pangolin/backup/working/samples/<sample_name>/<date_flowcell>/references/ref_ambig.fasta). Format should be <sample_name>-<date>_<flowcell>
consensus_sequence	seq	"Consensus sequence. Positions with < 5x coverage are ""N"". Minor bases with >= 5% frequency and present in >= 2 reads contribute to an IAUPC ambiguity code at a position. Lowercase letters indicate < 50x coverge. Sequences may include gaps, which are coded with ""-"". "
consensus_sequence	coverage	Mean coverage. From V-pipe output qa.csv.
consensus_sequence	r1_basequal	"FastQC per base quality flag for read 1. Based on quality scores. WARNING if the lower quartile for any base is less than 10, or if the median for any base is less than 25. FAIL if the lower quartile for any base is less than 5 or if the median for any base is less than 20. From V-pipe output qa.csv."
consensus_sequence	r2_basequal	"FastQC per base quality flag for read 2. Based on quality scores. WARNING if the lower quartile for any base is less than 10, or if the median for any base is less than 25. FAIL if the lower quartile for any base is less than 5 or if the median for any base is less than 20. From V-pipe output qa.csv."
consensus_sequence	rejreads	Percentage of reads rejected by Prinseq. From V-pipe output qa.csv.
consensus_sequence	alnreads	Percentage of kept reads that were aligned. From V-pipe output qa.csv.
consensus_sequence	insertsize	Number of bases covered by a read pair amplicon length. From V-pipe output qa.csv.
consensus_sequence	consensus_n	Number of bases with less than 5 reads coverage. From V-pipe output qa.csv.
consensus_sequence	consensus_lcbases	Number of bases with less than 50 reads coverage. From V-pipe output qa.csv.
consensus_sequence	divergence	"Number of sites where seq != reference genome MN908947 and seq != N and seq != ""-"". From Nextstrain diagnostic.py output."
consensus_sequence	excess_divergence	divergence - expectd_divergence where expected_divergence = (days between Dec 1 2019 and sample collection date) * 25/365. From Nextstrain diagnostic.py output.
consensus_sequence	number_n	"Number of bases coded ""N"". From Nextstrain diagnostic.py output."
consensus_sequence	number_gaps	Number of gaps compared to reference genome MN908947. From Nextstrain diagnostic.py output.
consensus_sequence	clusters	Regions with SNPs clustered in close genomic proximity. From Nextstrain diagnostic.py output.
consensus_sequence	gaps	"Gaps compared to reference genome MN908947. From Nextstrain diagnostic.py output. These ranges are 0-indexed and start-exclusive, end-inclusive."
consensus_sequence	all_snps	Genome positions of SNPs compared to reference genome MN908947. From Nextstrain diagnostic.py output.
consensus_sequence	flagging_reason	Sequence quality issue flagged by Nextstrain diagnostic.py.
consensus_sequence	fail_reason	"Reason the sample fails quality control. ""no fail reason"" passes QC. Null values should correspond to rows that are not true samples. This column is generated by a script written by cEvo and the values should be treated with a grain of salt because the QC criteria we use to submit to GISAID has changed several times."
consensus_sequence	sequencing_center	"Where the sample was sequenced. fgcz = Functional Genomics Center Zurich, gfb = Genomics Facility Basel, h2030 = Health 2030."
consensus_sequence	sequencing_batch	<Date of the sequencing run in YYYYMMDD format>_<flowcell name>.
consensus_sequence	comment	Any notes about why the sample is special or additional information.
consensus_sequence	variant_of_concern	"The name of a variant of concern, or just <null>."
consensus_sequence	is_random	Whether the sample was randomly selected or part of a targeted investigation.
consensus_sequence	seq_unaligned	"Unaligned sequence as a character string. Column introduced June 2021 to handle sequences with insertions, currently column is only use on a case-by-case basis (e.g. for comparing pacbio results to illumina)"
consensus_sequence	dont_release	Boolean specifying whether (TRUE) a sequence should be excluded from release.
consensus_sequence_mutation_nucleotide	sample_name	The sample name as in consensus_sequence.
consensus_sequence_mutation_nucleotide	position	The position in the sequence as aligned as usual.
consensus_sequence_mutation_nucleotide	mutation	The mutated base codon in the sequence. It has to be upper-case!
consensus_sequence_nextclade_mutation_aa		The amino acid mutations that Nextclade detected.
consensus_sequence_nextclade_mutation_aa	sample_name	The sample name as in consensus_sequence.
consensus_sequence_nextclade_mutation_aa	aa_mutation	"The mutation in the Nextclade format. Examples: ""S:N123Y"", ""ORF8:Y45-"""
consensus_sequence_nextclade_data		Results from Nextclade without the mutations
country		A list of countries
country_old		A incomplete (!) list of countries used by the timeseries dashboard
dashboard_state		Stores general state information of the timeseries dashboard. It should contain exactly one row.
dashboard_state	last_data_update	The date on which the bag_dashboard_meldeformular table was last updated.
gisaid_sequence		The GISAID nextfasta and nextmeta dataset.
gisaid_sequence	date	"The parsed date. If the dataset only provides a year or a month, it is undefined which exact day will be set."
gisaid_sequence	date_str	"The date in the same format as it was provided in the dataset. E.g., in some cases, only the year is provided."
gisaid_sequence	original_seq	The sequence from the dataset.
gisaid_sequence	aligned_seq	The sequence after alignment with our reference using mafft.
gisaid_sequence_mutation_nucleotide	strain	The strain as in gisaid_sequence.
gisaid_sequence_mutation_nucleotide	position	The position in the sequence as aligned as usual.
gisaid_sequence_mutation_nucleotide	mutation	The mutated base codon in the sequence. It has to be upper-case!
swiss_canton		Official canton codes and names in various languages according to p. 81 in https://www.bk.admin.ch/dam/bk/en/dokumente/sprachdienste/English%20Style%20Guide.pdf.download.pdf/english_style_guide.pdf
ext_swiss_demographic		"Demographic balance by age and canton (px-x-0102020000_104), provided by the Swiss Federal Statistical Office, https://www.pxweb.bfs.admin.ch/pxweb/en/px-x-0102020000_104/px-x-0102020000_104/px-x-0102020000_104.px, contains data between 2010 and 2019."
variant_mutation		The mutations that characterize a variant.
variant_mutation	variant_name	The name of a variant
variant_mutation	aa_mutation	"The mutation in the Nextclade format. Examples: ""S:N123Y"", ""ORF8:Y45-"""
viollier_plate		Viollier's plates containing both substance of positive and negative tests
viollier_plate	viollier_plate_name	Plate names set by Viollier. They have usually the format <day><month><year>eg<number> or <day><month><year>wuhan<number>. The name is stored as lower case.
viollier_plate	gfb_number	"The name a plate gets when it is shipped to the Genomics Facility Basel. As of 11.06.2021 full plates no longer get renamed, so a ""-"" in this column after this date just indicates a plate was sent to GFB. Another change was implemented 22.06.21 where we switch from writing into this column to sequencing_center. This column will not be updated thereafter."
viollier_plate	fgcz_name	"The name a plate gets when it is shipped to the Functional Genomics Center Zurich. As of 11.06.2021 full plates no longer get renamed, so a ""-"" in this column after this date just indicates a plate was sent to FGCZ. Another change was implemented 22.06.21 where we switch from writing into this column to sequencing_center. This column will not be updated thereafter."
viollier_plate	sequencing_center	"Where the plate was sent for sequencing, or whether it stayed at Viollier for in-house sequencing."
viollier_plate	left_viollier_date	The date when a plate was shipped to a genomic facility. It was introduced in October and is not set for earlier entries.
viollier_plate	has_no_extract	The plate was run on a BioRad CFX machine without producing a highly purified RNA extract.
viollier_plate	comment	Additional information as free text.
viollier_test		Information about both positive and negative tests at Viollier. It should contain all positive tests but not not all negative tests.
viollier_test	sample_number	"A unique number for every test, it is defined by the diagnostic lab."
viollier_test	ethid	A unique number defined by us. It is defined for all samples that we sequenced and more.
viollier_test	order_date	The order date
viollier_test	zip_code	The zip code of the office of the doctor.
viollier_test	city	The city of the office of the doctor.
viollier_test	canton	The canton of the office of the doctor.
viollier_test	pcr_code	A technical code set by Viollier. 4 corresponds to a positive test.
viollier_test	is_positive	Whether a test is positive it should be true if and only if pcr_code is 4.
viollier_test	purpose	The sampling strategy/purpose: allowed values are currently ""surveillance"" and ""diagnostic""
viollier_test	sequenced_by_viollier	Whether the plate was sequenced by Viollier; Viollier started sequencing in week 15/16 of 2021.
viollier_test	comment	Additional information as free text.
viollier_test__viollier_plate		Connects the viollier_test and viollier_plate table
viollier_test__viollier_plate	sample_number	As defined in viollier_test
viollier_test__viollier_plate	viollier_plate_name	As defined in viollier_plate
viollier_test__viollier_plate	well_position	The position of the sample on the plate. It is in upper-case.
viollier_test__viollier_plate	e_gene_ct	"As of 11.06.2021 Viollier only provides a single CT value, which is written into this column. "
viollier_test__viollier_plate	rdrp_gene_ct	CT value of the RdRp gene
viollier_test__viollier_plate	seq_request	Boolean specifying whether (TRUE) a sample was included on the list of samples we sent to the sequencing centers to sequence. This column is filled in from queries in script sql/viollier_test.sql and has only been filled in for samples sent from viollier beginning on 2021-04-19.
bag_sequence_report	auftraggeber_nummer	"A unique number for every test, it is defined by the BAG. Called the 'sample_number' in other tables. Blank for non-Viollier samples unless we are able to get this number from the data source (e.g. hospital)."
bag_sequence_report	alt_seq_id	Any other identifier we have for a sequence that could help the BAG identify it.
bag_sequence_report	viro_purpose	"The reason for sampling. ""outbreak"" (e.g. outbreak investigation, or re-sampling of a possible re-infection case); ""travel_case"" (e.g. sequences from people recently arrived from the U.K.; ""surveillance"" (random sampling of regular laboratory positive samples); ""screening"" (random sampling o asymptomatic individuals, e.g. of Army recruits)"
bag_sequence_report	viro_source	"The source of the sample. ""Swiss Viollier Sequencing Consortium""; ""Armee""; hospital or laboratory information."
bag_sequence_report	viro_seq	"The group responsible for sequencing. ""ETHZ, D-BSSE"""
bag_sequence_report	viro_characterised	"How the sample was analysed. ""no"" (not analysed); ""mcPCR_501Y"", ""sangerS"", ""wgs"" (whole-genome sequencing)"
bag_sequence_report	viro_gisaid_id	GISAID identifier for the sequence. Blank for un-released (low quality) or not-yet -released samples.
bag_sequence_report	viro_genbank_id	Genbank identifier for the sequence. Blank for un-released (low quality) or not-yet -released samples.
bag_sequence_report	viro_ref_sequence_id	"Reference sequence against which mutations are called. ""MN908947.3"" (https://www.ncbi.nlm.nih.gov/nuccore/MN908947)"
bag_sequence_report	viro_relevant_mutations_to_ref_seq	"List of amino acid mutations relative to the reference sequence. For ETHZ, D-BSSE sequences these are generated with the  nextclade tool. "
bag_sequence_report	viro_label	"Variant of concern label. ""B.1.1.7""; ""501Y.V2"""
non_viollier_test		This table is used to store metadata for samples that don't come from Viollier.
ext_demography_age		"The number of people per age group (5-years brackets) and country. The numbers are estimates made by the UN for the year 2020. The data file ""Population by Age Groups - Both Sexes"" was downloaded on 27.03.2021 from https://population.un.org/wpp/Download/Standard/Population/ / https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/EXCEL_FILES/1_Population/WPP2019_POP_F07_1_POPULATION_BY_AGE_BOTH_SEXES.xlsx."
ext_owid_global_cases	iso_country	ISO 3166-1 alpha-3 three-letter country codes standardized to only those in 'country' table
ext_owid_global_cases	country	Location as it appears in the source data.
ext_owid_global_cases	date	Date of observation.
ext_owid_global_cases	new_cases_per_million	"New confirmed cases of COVID-19 per 1,000,000 people from COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University."
ext_owid_global_cases	new_deaths_per_million	"New deaths attributed to COVID-19 per 1,000,000 people from COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University."
ext_owid_global_cases	new_cases	New confirmed cases of COVID-19 from COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.
ext_owid_global_cases	new_deaths	New deaths attributed to COVID-19 from COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.
ext_fso_tourist_accomodation		"For more information on this table, see the saved query to the FSO database: https://www.pxweb.bfs.admin.ch/sq/57be922e-05fb-4cad-ab19-bd823c54fb1d"
ext_fso_tourist_accomodation	iso_country	"ISO 3166-1 alpha-3 three-letter country codes standardized to those in 'country' table. Unknown countries are coded as 'XXX', e.g. 'Other European countries' in data source"
ext_fso_tourist_accomodation	country	Origin country of travellers as it appears in the data source.
ext_fso_tourist_accomodation	date_type	Timespan of date. Monthly.
ext_fso_tourist_accomodation	date	"Date of observation. The data is monthly, so dates are the first of the month."
ext_fso_tourist_accomodation	n_arrivals	Number of travellers arrived.
ext_fso_cross_border_commuters		"For more information on this table, see the saved query to the FSO database: https://www.pxweb.bfs.admin.ch/sq/1f6791fe-7f97-4466-a1e7-82314ede5277"
ext_fso_cross_border_commuters	iso_code	"ISO 3166-1 alpha-3  three-letter country codes standardized to those in 'country' table. Unknown countries are coded as 'XXX',  e.g. Andere in data source"
ext_fso_cross_border_commuters	country	Origin country of workders as it appears in data source.
ext_fso_cross_border_commuters	date_type	Timespan of date. Quarterly.
ext_fso_cross_border_commuters	wirtschaftsabteilung	Employment sector (in German).
ext_fso_cross_border_commuters	date	"Date of observation. The data is quarterly, so dates are the first of the first month of the quarter."
ext_fso_cross_border_commuters	n_permits	Number of cross-border workers with G permit.
frameshift_deletion_diagnostic	sample_name	Name given to the sequenced material (assigned by the sequencing center).
frameshift_deletion_diagnostic	start_position	Start position of deletion with respect to reference genome MN908947.
frameshift_deletion_diagnostic  indel_type  Type of mutation detected (deletion/insertion/stopgain/stoploss)
frameshift_deletion_diagnostic	length	Length of deletion (in nucleotide units).
frameshift_deletion_diagnostic	gene_region	Gene in which the deletion is found. Lara says the nucleotide position to gene mapping is taken somehow from the visualization part of V-pipe: https://github.com/cbg-ethz/V-pipe/blob/caesar_div/references/gffs/Genes_NC_045512.2.GFF3
frameshift_deletion_diagnostic	reads_all	Total number of reads covering the first position of the deletion.
frameshift_deletion_diagnostic	reads_fwd	Total number of forward reads covering the deletion.
frameshift_deletion_diagnostic	reads_rev	Total number of reverse reads covering the deletion.
frameshift_deletion_diagnostic	deletions	Number of reads supporting the deletion.
frameshift_deletion_diagnostic	freq_del	Fraction of reads supporting the deletion.
frameshift_deletion_diagnostic	freq_del_fwd	Fraction of forward reads supporting the deletion.
frameshift_deletion_diagnostic	freq_del_rev	Fraction of reverse reads supporting the deletion.
frameshift_deletion_diagnostic	deletions_fwd	Number of forward reads supporting the deletion.
frameshift_deletion_diagnostic	deletions_rev	Number of reverse reads supporting the deletion.
frameshift_deletion_diagnostic  insertions  Number of reads supporting the insertion.
frameshift_deletion_diagnostic  freq_insert Fraction of reads supporting the insertion.
frameshift_deletion_diagnostic  freq_insert_fwd Fraction of forward reads supporting the insertion.
frameshift_deletion_diagnostic  freq_insert_rev Fraction of reverse reads supporting the insertion.
frameshift_deletion_diagnostic  insertions_fwd  Number of forward reads supporting the insertion.
frameshift_deletion_diagnostic  insertions_rev  Number of reverse reads supporting the insertion
frameshift_deletion_diagnostic	matches_ref	Number of reads where the base matches the ref-base.
frameshift_deletion_diagnostic	pos_critical_inserts	Start positions (in reference genome coordinates) of insertions in the same gene_region that occur in > 40% of reads.
frameshift_deletion_diagnostic	pos_critical_dels	Start positions (in reference genome coordinates) of deletions in the same gene_region that occur in > 40% of reads.
frameshift_deletion_diagnostic	homopolymeric	"True if either around the start or end position of the deletion three bases are the same, which may have caused the polymerase to skip during reverse transcription of viral RNA to cDNA, e.g. AATAG."
frameshift_deletion_diagnostic	ref_base	Base in the reference genome.
frameshift_deletion_diagnostic	indel_diagnosis	Summary of support for / explanation of the frameshift insertion/deletion or the stopgain/stoploss event. This is reported to GISAID upon submission.
frameshift_deletion_diagnostic	indel_position	Long form summary of insertion/deletion/stopgain/stoploss position for report to GISAID.
frameshift_deletion_diagnostic  stops   Number of reads supporting the stop codon (count is done at the first position of the codon).
frameshift_deletion_diagnostic  freq_stop   Fraction of reads supporting the stop codon.
frameshift_deletion_diagnostic  freq_stop_fwd   Fraction of forward reads supporting the stop codon.
frameshift_deletion_diagnostic  freq_stop_rev   Fraction of reverse reads supporting the stop codon.
frameshift_deletion_diagnostic  stops_fwd   Number of forward reads supporting the stop codon.
frameshift_deletion_diagnostic  stops_rev Number of reverse reads supporting the stop codon.
swiss_wastewater_plant		Wastewater plant names.
foph_travel_quarantine		Compilation of FOPH travel quarantine orders (created by hand on 10.03.2021). Sources are copies of the quarantine law from https://www.bag.admin.ch/bag/en/home/krankheiten/ausbrueche-epidemien-pandemien/aktuelle-ausbrueche-epidemien/novel-cov/empfehlungen-fuer-reisende/liste.html and https://www.fedlex.admin.ch/eli/cc/2020/496/en.
ext_country_coordinates		"Coordinates of the centroids of various countries taken from the R package CoordinateCleaner's data, which they say they get from http://geo-locate.org. Since there are multiple entries per country, these are the average coordinates."
sequence_identifier	ethid	"This column is the key column: each ethid corresponds to a unique sample, and each sample should only be published once."
sequence_identifier	gisaid_id	"This column is filled in by the script import_gisaid_epi_isl.R, which relies on data downloads from GISAID to map between ethid an gisaid epi accession once a sequence has been accepted."
sequence_identifier	sample_name	This column is filled in by the script export_gisaid_submission.R when the GISAID submission is prepared. It only began being filled out mid-March or early-Apil 2021.
sequence_identifier	gisaid_uploaded_at	This column is filled in by the script export_gisaid_submission.R when the GISAID submission is prepared. It only began being filled out mid-Apil 2021.
sequence_identifier	ena_id	This column was filled in based on a semi-manual mapping of virus names between GISAID and ENA from SPSP in April 2021.
pangolin_lineage_alias		"If a lineage with three numbers (e.g., B.1.617.2) is about to receive a sub-lineage, it gets an alias. For example, B.1.617.2 has the alias AY. B.1.617.2.1 is then AY.1. See https://github.com/cov-lineages/pango-designation/blob/master/alias_key.json for a list of the aliases. Recombinant lineages (which have more than one parental lineage) are excluded."
ext_problematic_site		"The problematic (nucleotide) sites. Source: https://github.com/W-L/ProblematicSites_SARS-CoV2 (v5, commit 31ad9d4)"
lab_code_foph		"This table was introduced 31.08.2021 after email ""FOPH - Data flow national genomic SARS-CoV-2 surveillance program"" from SPSP. It's used by the script R/export_spsp_submission.R"
lab_code_foph	lab_code_foph	A 5-digit ID code that FOPH assigend to each diagnostics lab for electronic reporting to MSys according to the FOPH.
lab_code_foph	lab_name	Name of the lab according to the FOPH-provided list.
lab_code_foph	covv_orig_lab	Name of the lab as we submit it to SPSP.
billing_report		"This view was created using ""billing.sql"" by SN. It aims to report number of samples FROM VIOLLIER sent for sequencing, finished sequencing, and successfully submitted per sequencing center per week."
billing_report	week_finished	"Boolean. Starting with 2021 week 22, true if the number of sequence requests equals the number of samples for which we got back sequencing data."
billing_report	samples_sent	Number of positive samples with metadata in viollier_test that are on a plate recorded as having left viollier.
billing_report	samples_seq_request	Number of positive samples with metadata in viollier_test that are on a plate recorded as having left viollier AND have seq_request 'true' in viollier_test__viollier_plate.
billing_report	sequencing_batches	"All the distinct sequencing batches found among samples collected in this week, sent to this sequencing center, for which we received data back."
billing_report	sequenced	"Number of distinct samples (where a sample is a metadata entry in viollier_test) collected in this week, sequenced by this sequencing center, for which we received data back."
billing_report	submittable	Number of 'sequenced' samples where fail_reason in consensus_sequence is 'no_fail_reason'.
billing_report	gisaid_submitted_but_not_confirmed	Number of 'sequenced' samples that have been submitted (i.e. have an entry in sequence_identifier) but do not have a GISAID_EPI_ISL associated in the database yet.
billing_report	gisaid_confirmed_upload	Number of 'sequenced' samples that have been submitted (i.e. have an entry in sequence_identifier) and have a GISAID_EPI_ISL associated.
billing_report	submitted_to_GISAID	Total number of sequences submitted to GISAID; sum of 'gisaid_submitted_but_not_confirmed' and 'gisaid_confirmed_upload'.
billing_report	submitted_to_GISAID_percentage	Fraction of 'sequenced' samples submitted to GISAID; 'submitted_to_GISAID' divided by 'sequenced'.
billing_report_team_w		"This view was created by SN. It aims to report number of samples FROM TEAM W sent for sequencing, finished sequencing, and successfully submitted per sequencing center per week."
billing_report_team_w	week_finished	"Boolean. Checks that the number of sequences received from a week is a multiple of 93 (the number of samples team w puts on a plate). As of Sept 2021, team w plates always contain samples from the same week."
billing_report_team_w	samples_sent	Number of entries in non_viollier_test from team w collected in this week.
billing_report_team_w	sequencing_batches	All the distinct sequencing batches found among samples collected in this week for which we received data back.
billing_report_team_w	sequenced	"Number of samples (where a sample is a metadata entry in non_viollier_test) collected in this week, sequenced by this sequencing center, for which we received data back."
billing_report_team_w	submittable	Number of 'sequenced' samples where fail_reason in consensus_sequence is 'no_fail_reason'.
billing_report_team_w	gisaid_submitted_but_not_confirmed	Number of 'sequenced' samples that have been submitted (i.e. have an entry in sequence_identifier) but do not have a GISAID_EPI_ISL associated in the database yet.
billing_report_team_w	gisaid_confirmed_upload	Number of 'sequenced' samples that have been submitted (i.e. have an entry in sequence_identifier) and have a GISAID_EPI_ISL associated.
billing_report_team_w	submitted_to_GISAID	Total number of sequences submitted to GISAID; sum of 'gisaid_submitted_but_not_confirmed' and 'gisaid_confirmed_upload'.
billing_report_team_w	submitted_to_GISAID_percentage	Fraction of 'sequenced' samples submitted to GISAID; 'submitted_to_GISAID' divided by 'sequenced'.
z_test_metadata		Information about (positive and negative) COVID-19 PCR tests
z_test_metadata	test_id	A Swiss-wide unique identifier for a test. It has the format `<lab name>/<Laborauftragsnummer>`.
z_test_metadata	ethid	"A unique number defined by us. It has to be defined for all samples that we sequence. This number will be added to the ""strain"" name that we submit to public databases."
z_test_metadata	order_date	The date on which the lab received the order. We expect it to be the same as the sampling date.
z_test_metadata	canton	The canton of sampling or of the office of the doctor.
z_test_metadata	zip_code	The zip code of sampling or of the office of the doctor.
z_test_metadata	city	The city of sampling or of the office of the doctor.
z_test_metadata	is_positive	Whether a test is positive
z_test_metadata	comment	Additional information as free text
z_extraction_plate		Information about plates produced by the originating labs containing RNA extracts and sent to the sequencing centers
z_extraction_plate	extraction_plate_id	A Swiss-wide unique identifier for an extraction plate. It has the format `<lab name>/<plate name given by the lab>`.
z_extraction_plate	gfb_number	"The name a plate gets when it is shipped to the Genomics Facility Basel. As of 11.06.2021 full plates no longer get renamed, so a ""-"" in this column after this date just indicates a plate was sent to GFB. Another change was implemented 22.06.21 where we switch from writing into this column to sequencing_center. This column will not be updated thereafter"
z_extraction_plate	fgcz_name	"The name a plate gets when it is shipped to the Functional Genomics Center Zurich. As of 11.06.2021 full plates no longer get renamed, so a ""-"" in this column after this date just indicates a plate was sent to FGCZ. Another change was implemented 22.06.21 where we switch from writing into this column to sequencing_center. This column will not be updated thereafter."
z_extraction_plate	health2030	A boolean indicating whether a plate was shipped to Health 2030. Another change was implemented 22.06.21 where we switch from writing into this column to sequencing_center. This column will not be updated thereafter.
z_extraction_plate	left_lab_or_received_metadata_date	"Originally, it was the date when a plate was shipped to a genomic facility. It was introduced in October 2020 and is not set for earlier entries. Since June 2021, we don't necessarily know the exact date when a plate was shipped but rather insert the date when we received the metadata."
z_extraction_plate	sequencing_center	"Where the plate was sent for sequencing, or whether it stayed at Viollier for in-house sequencing."
z_extraction_plate	viollier_extract_free	"A boolean indicating whether this plate is from Viollier and ""extract free"". Extract free plates should contain the RNA material but are not purified. So far, we have not been able to generate sequences from those plates."
z_extraction_plate	comment	Additional information as free text
z_sequencing_plate		Information about plates that goes into the sequencers. They may be equivalent to the extraction plates.
z_sequencing_plate	sequencing_plate_id	A unique identifier for a sequencing plate.
z_sequencing_plate	sequencing_center	The genomic facility that sequenced the plate
z_sequencing_plate	sequencing_date	The date of sequencing
z_sequencing_plate	comment	Additional information as free text
z_test_plate_mapping		"A mapping between test_metadata, extraction_plate and sequencing_plate"
z_test_plate_mapping	test_id	As defined in test_metadata
z_test_plate_mapping	old_sample	"A sample that left Viollier before 07 June 2021 is considered as old. We have some wrong mappings in the old data: for example, we have multiple tests mapped to the same plate and well. For the new data (i.e., old_sample=false), we will prevent this by enforcing a uniqueness of plate-well combinations in this table."
z_test_plate_mapping	extraction_plate	As defined in extraction_plate
z_test_plate_mapping	extraction_plate_well	The well position on the extraction plate
z_test_plate_mapping	extraction_e_gene_ct	"As of 11.06.2021 Viollier only provides a single CT value, which is written into this column."
z_test_plate_mapping	extraction_rdrp_gene_ct	The RdRp gene CT
z_test_plate_mapping	sequencing_plate	As defined in sequencing_plate
z_test_plate_mapping	sequencing_plate_well	The well position on the sequencing plate
z_test_plate_mapping	sample_type	"The sample type. Valid values are ""clinical"", ""empty"", ""positive control"", ""negative control"", and ""wastewater""."
z_consensus_sequence		The generated consensus sequences
z_consensus_sequence	sample_name	Name given to the sequenced material (ultimately assigned by the sequencing center)
z_consensus_sequence	sequencing_plate	As defined in sequencing_plate
z_consensus_sequence	sequencing_plate_well	The well position on the sequencing plate
z_consensus_sequence	insert_date	The date when we first insert the sequence
z_consensus_sequence	update_date	The date when we update the sequence
z_consensus_sequence	sequencing_center	The genomic facility that did the sequencing
z_consensus_sequence	sequencing_batch	The V-Pipe sequencing batch name
z_consensus_sequence	seq_aligned	The aligned version of the sequence
z_consensus_sequence	seq_unaligned	The unaligned version of the sequence
z_consensus_sequence	ethid	ETH identifier for a sample.
z_consensus_sequence_meta		Information that are derived from or related to a consensus sequence. This includes QC values and lineage information.
z_consensus_sequence_meta	sample_name	As defined in consensus_sequence
z_consensus_sequence_meta	references	-
z_consensus_sequence_meta	coverage_mean	-
z_consensus_sequence_meta	r1_basequal	-
z_consensus_sequence_meta	r2_basequal	-
z_consensus_sequence_meta	rejreads	-
z_consensus_sequence_meta	alnreads	-
z_consensus_sequence_meta	insertsize	-
z_consensus_sequence_meta	consensus_n	-
z_consensus_sequence_meta	qc_result	"The summarized result. The column was formerly called ""fail_reason""."
z_consensus_sequence_meta	diagnostic_divergence	-
z_consensus_sequence_meta	diagnostic_excess_divergence	-
z_consensus_sequence_meta	diagnostic_number_n	-
z_consensus_sequence_meta	diagnostic_number_gaps	-
z_consensus_sequence_meta	diagnostic_clusters	-
z_consensus_sequence_meta	diagnostic_gaps	-
z_consensus_sequence_meta	diagnostic_all_snps	-
z_consensus_sequence_meta	diagnostic_flagging_reason	-
z_consensus_sequence_meta	nextclade_clade	-
z_consensus_sequence_meta	nextclade_qc_overall_score	-
z_consensus_sequence_meta	nextclade_qc_overall_status	-
z_consensus_sequence_meta	nextclade_total_gaps	-
z_consensus_sequence_meta	nextclade_total_insertions	-
z_consensus_sequence_meta	nextclade_total_missing	-
z_consensus_sequence_meta	nextclade_total_mutations	-
z_consensus_sequence_meta	nextclade_total_non_acgtns	-
z_consensus_sequence_meta	nextclade_total_pcr_primer_changes	-
z_consensus_sequence_meta	nextclade_alignment_start	-
z_consensus_sequence_meta	nextclade_alignment_end	-
z_consensus_sequence_meta	nextclade_alignment_score	-
z_consensus_sequence_meta	nextclade_qc_missing_data_score	-
z_consensus_sequence_meta	nextclade_qc_missing_data_status	-
z_consensus_sequence_meta	nextclade_qc_missing_data_total	-
z_consensus_sequence_meta	nextclade_qc_mixed_sites_score	-
z_consensus_sequence_meta	nextclade_qc_mixed_sites_status	-
z_consensus_sequence_meta	nextclade_qc_mixed_sites_total	-
z_consensus_sequence_meta	nextclade_qc_private_mutations_cutoff	-
z_consensus_sequence_meta	nextclade_qc_private_mutations_excess	-
z_consensus_sequence_meta	nextclade_qc_private_mutations_score	-
z_consensus_sequence_meta	nextclade_qc_private_mutations_status	-
z_consensus_sequence_meta	nextclade_qc_private_mutations_total	-
z_consensus_sequence_meta	nextclade_qc_snp_clusters_clustered	-
z_consensus_sequence_meta	nextclade_qc_snp_clusters_score	-
z_consensus_sequence_meta	nextclade_qc_snp_clusters_status	-
z_consensus_sequence_meta	nextclade_qc_snp_clusters_total	-
z_consensus_sequence_meta	nextclade_errors	-
z_consensus_sequence_meta	pango_lineage	-
z_consensus_sequence_meta	pango_probability	-
z_consensus_sequence_meta	pango_learn_version	-
z_consensus_sequence_meta	pango_status	-
z_consensus_sequence_meta	pango_note	-
z_consensus_sequence_mutation_aa		The amino acid mutations of a sequence
z_consensus_sequence_mutation_aa	sample_name	As defined in consensus_sequence
z_consensus_sequence_mutation_aa	aa_mutation	An amino acid mutation
z_consensus_sequence_mutation_nucleotide		The nucleotide mutations of a sequence
z_consensus_sequence_mutation_nucleotide	sample_name	As defined in consensus_sequence
z_consensus_sequence_mutation_nucleotide	nuc_mutation	A nucleotide mutation
z_consensus_sequence_notes		Manually curated notes about sequences
z_consensus_sequence_notes	sample_name	As defined in consensus_sequence
z_consensus_sequence_notes	release_decision	"Can be used to manually override ""qc_result"""
z_consensus_sequence_notes	purpose	A specific reason for sequencing
z_consensus_sequence_notes	comment	Additional information as free text