SweGen whole-genome sequencing from the Swedish Twin Registry

Swedish population

genetic variation

reference dataset

genomics

Dataset

Publisher

Uppsala University

Published

4 April 2025

The dataset contains whole-genome sequencing data (aligned read files) in CRAM-format (lossless compression) for a total of 942 DNA samples, selected to represent a cross-section of the Swedish population. The samples originate from the Swedish Twin Registry (STR) and have been obtained from different geographical regions. For each of the 942 individuals, DNA was extracted from a blood sample and subject to whole genome sequencing (WGS). The WGS was performed using 2x150 bp paired-end chemistry on Illumina HiSeq X Ten instrumentation at the SciLifeLab National Genomics Infrastructure (NGI) in Stockholm and Uppsala. FASTQ files generated by WGS were analyzed using the nf-core pipeline Sarek, which includes pre-processing, alignment to the human GRCh38 reference genome, and germline variant calling. All participants gave their written informed consent and the TwinGene study was approved by the regional ethics committee (Regionala Etikprövningsnämnden, Stockholm, dnr 2007-644-31, dnr 2014/521-32). Access to phenotypic information can be requested from the Swedish Twin Registry (http://ki.se/en/research/the-swedish-twin-registry).

This dataset is one of 4 datasets included in the study “SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population” (http://identifiers.org/ega.study:EGAS50000000906).

Official landing page: http://identifiers.org/ega.dataset:EGAD50000001326