Experiment metadata

Experiment metadata describes the conditions of your sequencing experiment. If you use different experimental settings for your samples, you must also document the connections between experiments and samples. Please note that experiment metadata is only required for FASTQ and BAM/CRAM submissions.

Metadata fields

The table below shows the fields that can be specified for an experiment. An asterisk indicates if the field is required.

Field name Description
Alias* The submitter’s designated name for the experiment. The name must be unique within the submission account.
Design name* A name of the design (e.g. “Whole genome sequencing - genomic library”)
Instrument model* The sequencing platform according to a controlled vocabulary (see permitted values)
Library name The submitter’s name for the library (e.g. “Solexa-32824”)
Library layout* Whether to expect single or paired reads (“PAIRED” or “SINGLE”)
Library source* The type of source material that is being sequenced according to a controlled vocabulary (see permitted values)
Library selection* Method used to enrich the target in the sequence library preparation according to a controlled vocabulary (see permitted values)
Library construction protocol Free form text describing the protocol by which the sequencing library was constructed

Experiment metadata templates

Templates for experiment metadata are available in the following formats:

File format Download Description
Text CSV experiment-metadata.csv Plain text template without validation
ODF Spreadsheet experiment-metadata.ods LibreOffice spreadsheet with validation
Office Open XML Spreadsheet experiment-metadata.xlsx Microsoft Excel spreadsheet with validation

Controlled vocabularies

Vocabularies to use when entering experiment metadata.

Permitted values for platform

The EGA Submitter Portal does not ask for this information, but the platform provides the context for the instrument (see the instrument vocabulary below). The table below has been compiled from ENA’s schema specification on GitHub.

Platform Description
LS454 454 technology use 1-color sequential flows
ILLUMINA Illumina is 4-channel flowgram with 1-to-1 mapping between basecalls and flows
HELICOS Helicos is similar to 454 technology - uses 1-color sequential flows
ABI_SOLID ABI is 4-channel flowgram with 1-to-1 mapping between basecalls and flows
COMP LETE_GENOMICS CompleteGenomics platform type. At present there is no instrument model.
BGISEQ
OXFORD_NANOPORE Oxford Nanopore platform type. nanopore-based electronic single molecule analysis
PACBIO_SMRT PacificBiosciences platform type for the single molecule real time (SMRT) technology.
ION_TORRENT Ion Torrent Personal Genome Machine (PGM) from Life Technologies.
CAPILLARY Sequencers based on capillary electrophoresis technology manufactured by LifeTech (formerly Applied BioSciences).
DNBSEQ Sequencers based on DNBSEQ by MGI Tech.

Permitted values for instrument

Use this vocabulary to specify the kind of instrument that was used. Select one of the instrument models below. The table below has been compiled from ENA’s schema specification on GitHub. For explanations of the values in the platform column, see Permitted values for platform above.

Instrument model Platform Remarks
454 GS LS454
454 GS 20 LS454
454 GS FLX LS454
454 GS FLX+ LS454
454 GS FLX Titanium LS454
454 GS Junior LS454
unspecified LS454
HiSeq X Five ILLUMINA
HiSeq X Ten ILLUMINA
Illumina Genome Analyzer ILLUMINA
Illumina Genome Analyzer II ILLUMINA
Illumina Genome Analyzer IIx ILLUMINA
Illumina HiScanSQ ILLUMINA
Illumina HiSeq 1000 ILLUMINA
Illumina HiSeq 1500 ILLUMINA
Illumina HiSeq 2000 ILLUMINA
Illumina HiSeq 2500 ILLUMINA
Illumina HiSeq 3000 ILLUMINA
Illumina HiSeq 4000 ILLUMINA
Illumina HiSeq X ILLUMINA
Illumina iSeq 100 ILLUMINA
Illumina MiSeq ILLUMINA
Illumina MiniSeq ILLUMINA
Illumina NovaSeq 6000 ILLUMINA
NextSeq 500 ILLUMINA
NextSeq 550 ILLUMINA
NextSeq 1000 ILLUMINA
NextSeq 2000 ILLUMINA
unspecified ILLUMINA
Helicos HeliScope HELICOS
unspecified HELICOS
AB SOLiD System ABI_SOLID Undifferentiated early AB SOLiD system
AB SOLiD System 2.0 ABI_SOLID
AB SOLiD System 3.0 ABI_SOLID
AB SOLiD 3 Plus System ABI_SOLID
AB SOLiD 4 System ABI_SOLID
AB SOLiD 4hq System ABI_SOLID
AB SOLiD PI System ABI_SOLID
AB 5500 Genetic Analyzer ABI_SOLID
AB 5500xl Genetic Analyzer ABI_SOLID
AB 5500xl-W Genetic Analysis System ABI_SOLID
unspecified ABI_SOLID
Complete Genomics COM PLETE_GENOMICS
unspecified COM PLETE_GENOMICS
BGISEQ-50 BGISEQ
BGISEQ-500 BGISEQ
MGISEQ-2000RS BGISEQ
PacBio RS PACBIO_SMRT
PacBio RS II PACBIO_SMRT
Sequel PACBIO_SMRT
Sequel II PACBIO_SMRT
Sequel IIe PACBIO_SMRT
unspecified PACBIO_SMRT
Ion Torrent PGM ION_TORRENT
Ion Torrent Proton ION_TORRENT
Ion Torrent S5 ION_TORRENT
Ion Torrent S5 XL ION_TORRENT
Ion Torrent Genexus ION_TORRENT
Ion GeneStudio S5 ION_TORRENT
Ion GeneStudio S5 Prime ION_TORRENT
Ion GeneStudio S5 Plus ION_TORRENT
unspecified ION_TORRENT
AB 3730xL Genetic Analyzer CAPILLARY
AB 3730 Genetic Analyzer CAPILLARY
AB 3500xL Genetic Analyzer CAPILLARY
AB 3500 Genetic Analyzer CAPILLARY
AB 3130xL Genetic Analyzer CAPILLARY
AB 3130 Genetic Analyzer CAPILLARY
AB 310 Genetic Analyzer CAPILLARY
unspecified CAPILLARY
DNBSEQ-T7 DNBSEQ
DNBSEQ-G400 DNBSEQ
DNBSEQ-G50 DNBSEQ
DNBSEQ-G400 FAST DNBSEQ
unspecified DNBSEQ
MinION OXFORD_NANOPORE
GridION OXFORD_NANOPORE
PromethION OXFORD_NANOPORE
unspecified OXFORD_NANOPORE

Permitted values for library source

Use this vocabulary to specify the type of source material that was sequenced. Select one of the values below. The table below has been compiled from ENA’s schema specification on GitHub.

Library source Description
GENOMIC Genomic DNA (includes PCR products from genomic DNA).
GENOMIC SINGLE CELL
TRANSCRIPTOMIC Transcription products or non genomic DNA (EST, cDNA, RT-PCR, screened libraries).
TRANSCRIPTOMIC SINGLE CELL
METAGENOMIC Mixed material from metagenome.
MET ATRANSCRIPTOMIC Transcription products from community targets
SYNTHETIC Synthetic DNA.
VIRAL RNA Viral RNA.
OTHER Other, unspecified, or unknown library source material.

Permitted values for library selection

Use this vocabulary to specify how the target was enriched in the sequence library preparation. Select one of the values below. The table below has been compiled from ENA’s schema specification on GitHub.

Library selection Description
RANDOM No Selection or Random selection
PCR target enrichment via PCR
RANDOM PCR Source material was selected by randomly generated primers.
RT-PCR target enrichment via
HMPR Hypo-methylated partial restriction digest
MF Methyl Filtrated
repeat fractionation Selection for less repetitive (and more gene rich) sequence through Cot filtration (CF) or other fractionation techniques based on DNA kinetics.
size fractionation Physical selection of size appropriate targets.
MSLL Methylation Spanning Linking Library
cDNA PolyA selection or enrichment for messenger RNA (mRNA); synonymize with PolyA
cDNA_ randomPriming
cDNA_oligo_dT
PolyA PolyA selection or enrichment for messenger RNA (mRNA); should replace cDNA enumeration.
Oligo-dT enrichment of messenger RNA (mRNA) by hybridization to Oligo-dT.
Inverse rRNA depletion of ribosomal RNA by oligo hybridization.
Inverse rRNA selection depletion of ribosomal RNA by inverse oligo hybridization.
ChIP Chromatin immunoprecipitation
ChIP-Seq Chromatin immunoPrecipitation, reveals binding sites of specific proteins, typically transcription factors (TFs) using antibodies to extract DNA fragments bound to the target protein.
MNase Identifies well-positioned nucleosomes. uses Micrococcal Nuclease (MNase) is an endo-exonuclease that processively digests DNA until an obstruction, such as a nucleosome, is reached.
DNase DNase I endonuclease digestion and size selection reveals regions of chromatin where the DNA is highly sensitive to DNase I.
Hybrid Selection Selection by hybridization in array or solution.
Reduced R epresentation Reproducible genomic subsets, often generated by restriction fragment size selection, containing a manageable number of loci to facilitate re-sampling.
Restriction Digest DNA fractionation using restriction enzymes.
5-m ethylcytidine antibody Selection of methylated DNA fragments using an antibody raised against 5-methylcytosine or 5-methylcytidine (m5C).
MBD2 protein methyl-CpG binding domain Enrichment by methyl-CpG binding domain.
CAGE Cap-analysis gene expression.
RACE Rapid Amplification of cDNA Ends.
MDA Multiple Displacement Amplification, a non-PCR based DNA amplification technique that amplifies a minute quantifies of DNA to levels suitable for genomic analysis.
padlock probes capture method Targeted sequence capture protocol covering an arbitrary set of nonrepetitive genomics targets. An example is capture bisulfite sequencing using padlock probes (BSPP).
other Other library enrichment, screening, or selection process.
unspecified Library enrichment, screening, or selection is not specified.