Genotype and sequencing data that have been obtained and are available either through dbGaP or by an approved data request to JHS are listed in the sections below. | Genetic analyses have been performed in accordance with participant consent. Thus the samples analyzed by the various platforms are largely overlapping.
Microsatellite markers – 1,486 members of 264 families. Marshfield Marker Set 16.
Ancestry Informative Marker Panel – (n=4,605)
- Description: Approximately 1,500 genome wide markers that are highly differentiated in frequency in Europeans compared to Africans. These markers provide less information than Affymetrix 6.0 GWAS genotypes, but are available in a larger sample, perhaps allowing, for example, imputation of rare variants of interest into related individuals.
- Methods: Genotyping methods and quality control are described in Nalls, et al.1
IBC Cardiovascular Candidate Gene Array2, 3 – (n=2,948)
- Genotyping was performed through NHLBI’s Candidate Gene Association Resource (CARe) consortium.
- Description: Gene-centric array interrogating ~55,000 SNPs selected to tag 2,100 CVD candidate genes chosen based on biologic function, involvement in CVD-related Mendelian syndromes, GWAS results, and other criteria.
- Methods: Design of the IBC Array is described in Keating et al2; and genotyping and quality control as well as organizational features of the CARe consortium are described in Musunuru, et al.3 A list of the genes and SNPs on the IBC Array is available on request.
Affymetrix 6.0 GWAS Genotyping – (n=3,029)
- Genotyping was performed through NHLBI’s Candidate Gene Association Resource (CARe) consortium.3
- Description: > 906,600 genome-wide tag SNPs and >946,000 probes for copy-number variation.
- Methods: Genotyping and quality control are described in Lettre, et al.4 Data have been imputed to the 1000 Genomes phase 1 v3 reference panel as described in Duan, et al.5
1000 Genome Phase 3 - (n=3029).
1000G Phase 3 Imputed Data: VCF files of dosage and likely genotypes for autosomal imputed SNPs from 1000 Genomes Project (1000G) Phase 3 version 5 reference panel. Imputation was completed using Minimac3 on the Michigan Imputation Server (PMID 27571263). The reference panel includes 5,008 haplotypes from 26 populations across the world (http://www.internationalgenome.org). Prior to imputation, SNPs were filtered for minor allele frequency ≥1%, call rate ≥ 90%, HWE p-value > 10-6, as well as exclusion of sites with invalid or mismatched alleles for the reference panel.
1) JHS Sample size (N) = 3,029 (includes 9 samples that are recommended to be excluded, based on quality control issues such as sex or pedigree mismatches)
2) Total SNPs imputed = 49,143,605 (not filtered for imputation quality or minor allele count)
Targeted exome sequencing in 256 candidate genes – (n=1,963)
- Sequencing was supported by the NHGRI sequencing centers in response to an application by Dr. Christine Seidman and others.
- Description: Candidate genes were nominated by project investigators based on evidence (from Mendelian families, GWAS, etc.) of involvement in LV remodeling, diabetes, dyslipidemia, dysrhythmia, or hypertension. A custom capture array targeting exons of 256 candidate genes was developed. DNA of 1,637 members of the Framingham Offspring Cohort and 1,963 members of the Jackson Heart Study was sequenced. A list of the targeted genes is available.
- Methods: Sequencing and quality control methods are described in Bick, et al.6
Exome sequencing – (n=3,374)
- Exome sequencing of JHS samples has been performed under four separate projects. The total of 3,374 unique samples includes some samples that were sequenced in more than one project:
- Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples (T2D-GENES; NIDDK): n=1,036.
- NHLBI’s Exome Sequencing Project (ESP): n=1,518.
- Minority Health Genomics and Translational Research Bio-repository Database (MH-GRID; NHLBI): n=312.
- Cohorts for Heart and Aging Research in Genomic Epidemiology Sequencing Project (CHARGE-S; performed through the Atherosclerosis Risk in Communities [ARIC] study [NHLBI] among participants included in both JHS and ARIC): n=522.
Methods: Library preparation, target capture, sequencing, variant calling and quality control have been performed at the Broad Institute, the University of Washington, and the Baylor College of Medicine (CHARGE-S) using methods similar to those described for the Exome Sequencing Project (Tennessen et al7). Sample shotgun libraries were captured for exome enrichment using one of three in-solution capture products: CCDS 2008 (~26 Mb), Roche/Nimblegen SeqCap EZ Human Exome Library v1.0 (~32 Mb; Roche Nimblegen EZ Cap v1), or EZ Cap v2 (~34 Mb), and sequencing was performed on Illumina GAIIx or HiSeq 2000 machines.
Joint calling: Sequence data from the four projects listed above were called jointly in the Kathiresan Laboratory at the Broad Institute. Sequence data of all participants were aligned to a human reference genome (hg19) using the Burrows–Wheeler Aligner algorithm. Aligned non-duplicate reads were locally realigned and base qualities were recalibrated using the Genome Analysis ToolKit software. Variants were jointly called using the Genome Analysis ToolKit software and filtered using the Variant Quality Score Recalibration, quality over depth metrics, and strand bias among other metrics.
Exome Chip – (n=2,790)
- Exome Chip genotyping was supported by R01HL107816 to S. Kathiresan.
- Description: The Exome Chip (Illumina Human Exome BeadChip v. 1.0) was developed through the Exome Sequencing Project as a cost-effective method to follow up on low-frequency and rare coding variants observed in the ESP and other exome sequencing studies. Content of the chip was derived from the exomes of 12,031 samples from an array of projects, largely involving participants of European ancestry but also including ~2,000 African Americans.
Selected variants included (n=243,094 designed successfully):
- nonsynonymous variants
- splice variants, and
- stop gain/loss variants
- variants were observed in at least two studies, except 8,242 variants seen only once and included for ethnic diversity.
Additional content included (numbers represent variants that designed successfully):
- 5,325 GWAS top SNPs reported by the time of design
- a grid of common variants (n=5,286)
- 4,651 random synonymous variants (including 870 genotyped on both strands)
- 3,241 ancestry informative markers for African ancestry
- 998 ancestry informative markers for Native American ancestry
- 2,459 HLA tags
- 846 ESP “requests”
- 259 fingerprint SNPs
- 270 Micro RNA Target Sites
- 246 mitochondrial SNPs
- 128 Y chromosome markers
- 181 Indels
- Methods: Genotyping, variant calling, and quality control were performed as described in Grove et al.8
DNA Methylation data in JHS was generated under ancillary study, ASN0104 (PI: Reiner A) for 1,757 samples mostly collected at baseline exam except for 6 participants that were drawn from exam 2. Illumina Methylation EPIC array data (containing over 850,000 CpG methylation sites) was generated at University of Washington, Seattle. Methylation β values (the ratio of intensities between methylated and un-methylated alleles) were normalized with respect to background color intensity using the normal-exponential out-of-band (NOOB) pre-processing method in the R package minfi [PMID 28035024]. Quality control checks performed using Horvath’s method [PMID: 24138928] identified outliers by hierarchical clustering approach as well as duplicates. The post QC participants were 1,752 aged on average 56±12 years old (range: 22-93 years) and 63% women (see Table).
Age in Years
Body Mass Index (kg/m^2)
Systolic Blood Pressure (mmHg)
Age in Years
Body Mass Index (kg/m^2)
Systolic Blood Pressure (mmHg)
Whole Genome Sequencing – (n=3,406)
- Description: Whole genome sequencing has been performed through NHLBI’s Trans-Omics for Precision Medicine (TOPMed) project at the direction of the Nickerson Laboratory at University of Washington. The TOPMed project include >100,000 samples from multiple cohorts, being sequenced at >30x depth of coverage with joint calling of all samples performed by the TOPMed Informatics Resource Center at the University of Michigan. Detailed methods are available at https://www.nhlbiwgs.org/data-sets.
- Status: 3,406 JHS participants with consent for genetic data sharing through dbGaP have been successfully sequenced. Variant calls are available to qualified researchers through dbGaP at study accession phs000964.
Select Genetic Variants Available at JHS for Analysis
Note: Select genetic variants have been genotyped directly on commercial genotyping arrays such as the Exome Chip (Illumina Human Exome BeadChip v. 1.0), IBC Cardiovascular Candidate Gene Array, or Affymetrix 6.0 GWAS array, or assessed by Exome Sequencing (see references on the website). These include: APOL1 G1 and G2 variants, Duffy null variants of the DARC gene, Hemoglobin C, PSCK9 loss of function variants, sickle hemoglobin (rs334) and a functional SCN5A missense variant. Alpha thalassemia-associated deletions have been assessed from whole genome sequence. These data are NOT distributed with the VC package but are available to investigators with approved JHS manuscript proposals through the Data Cordinating Center. For additional details go the link provided.
I) APOL1: data on the derived allele of coding SNP rs73885319 (p.S342G) defines, together with the derived allele of coding SNP rs60910145 (p.I384M), the APOL1 G1 alleles (Apolipoprotein L-1 (APOL1) gene. The derived allele of indel rs71785313 (p.NYK388K) defines the APOL1 G2 allele. Base pair positions are: chr22:36265860(+) and chr22:36265988(+) for G1 variants and chr2:36266000(+) for the G2 deletion, based on GRCh38.p7 assembly. JHS Sample size 3224.
II) DUFFY: data on rs2814778 SNP (i.e. upstream-variant-2KB, utr-variant-5-prime) in Atypical Chemokine Receptor 1 ( Duffy Antigen Receptor for Chemokines [DARC]; Duffy Blood Group antigen). Base pair position is chr1: 159204893(-) based on GRCh38.p7 assembly. JHS sample size: 3027
III) HbC: data on rs33930165 SNP (i.e. reference, missense) in Hemoglobin Subunit Beta (HBB) gene. Base pair position is ch11: 5227003(-) based GRCh38.p7 assembly. Gives rise to rare form of hemoglobin ‘Hb C’. JHS sample size: 3027
IV) PSCK9: data on rs28362286 SNP (i.e. nc-transcript-variant, reference, stop-gained) in Proprotein convertase subtilisin / kexin type 9 (PSCK9) gene. Base pair location chr1:55063542(+) based on GRCh38.p7 assembly. JHS sample size: 3027
V) Sickle cell trait: data on rs334 SNP (i.e. reference, missense) in Hemoglobin Subunit Beta (HBB) gene. Base pair position chr11:5227002(-) based GRCh38.p7 assembly. This dataset contains sickle cell trait/disease SNP for 3224 JHS participants.
VI) SCN5A: data on rs7626962 SNP (i.e. intron-variant, reference, missense) in Sodium Voltage-Gated Channel Alpha Subunit 5 (SCN5A) gene. Base pair location chr3: 38579416(+) based on GRCh38.p7 assembly. JHS sample size: 3027
VII) TTR: Transthyretin gene and associated genetic variant or SNP (rs76992529), a coding sequence and a missense variant located on chr18 base pair position 31,598,655(+) based on GRCh38.p7 assembly. Results from a G to A transition at the a CG dinucleotide codon of the 122 amino of a mature TTR protein. A total of 127 JHS samples carry the minor allele (A) of 3,447 individuals genotyped. Genotyping in JHS and quality control detailed in a paper by Grove et al. (2013)
1. Nalls MA, Wilson JG, Patterson NJ, Tandon A, Zmuda JM, Huntsman S, Garcia M, Hu D, Li R, Beamer BA, Patel KV, Akylbekova EL, Files JC, Hardy CL, Buxbaum SG, Taylor HA, Reich D, Harris TB, Ziv E. Admixture mapping of white cell count: genetic locus responsible for lower white blood cell count in the Health ABC and Jackson Heart studies. Am J Hum Genet. 2008 Jan;82(1):81-7. Erratum in: Am J Hum Genet. 2008 Feb;82(2):532. PMC2253985.
2. Keating BJ, Tischfield S, Murray SS, Bhangale T, Price TS, Glessner JT, Galver L, Barrett JC, Grant SF, Farlow DN, Chandrupatla HR, Hansen M, Ajmal S, Papanicolaou GJ, Guo Y, Li M, Derohannessian S, de Bakker PI, Bailey SD, Montpetit A, Edmondson AC, Taylor K, Gai X, Wang SS, Fornage M, Shaikh T, Groop L, Boehnke M, Hall AS, Hattersley AT, Frackelton E, Patterson N, Chiang CW, Kim CE, Fabsitz RR, Ouwehand W, Price AL, Munroe P, Caulfield M, Drake T, Boerwinkle E, Reich D, Whitehead AS, Cappola TP, Samani NJ, Lusis AJ, Schadt E, Wilson JG, Koenig W, McCarthy MI, Kathiresan S, Gabriel SB, Hakonarson H, Anand SS, Reilly M, Engert JC, Nickerson DA, Rader DJ, Hirschhorn JN, Fitzgerald GA. Concept, design and implementation of a cardiovascular gene-centric 50 k SNP array for large-scale genomic association studies. PLoS ONE. 2008;3(10):e3583. Epub 2008 Oct 31. PMC2571995.
3. Musunuru K, Lettre G, Young T, Farlow DN, Pirruccello JP, Ejebe KG, Keating BJ, Yang Q, Chen MH, Lapchyk N, Crenshaw A, Ziaugra L, Rachupka A, Benjamin EJ, Cupples LA, Fornage M, Fox ER, Heckbert SR, Hirschhorn JN, Newton-Cheh C, Nizzari MM, Paltoo DN, Papanicolaou GJ, Patel SR, Psaty BM, Rader DJ, Redline S, Rich SS, Rotter JI, Taylor HA Jr, Tracy RP, Vasan RS, Wilson JG, Kathiresan S, Fabsitz RR, Boerwinkle E, Gabriel SB; NHLBI Candidate Gene Association Resource. Candidate gene association resource (CARe): design, methods, and proof of concept. Circ Cardiovasc Genet. 2010 Jun 1;3(3):267-75. Epub 2010 Apr 17. PMID: 20400780.
4. Lettre G, Palmer CD, Young T, Ejebe KG, Allayee H, Benjamin EJ, Bennett F, Bowden DW, Chakravarti A, Dreisbach A, Farlow DN, Folsom AR, Fornage M, Forrester T, Fox E, Haiman CA, Hartiala J, Harris TB, Hazen SL, Heckbert SR, Henderson BE, Hirschhorn JN, Keating BJ, Kritchevsky SB, Larkin E, Li M, Rudock ME, McKenzie CA, Meigs JB, Meng YA, Mosley TH, Newman AB, Newton-Cheh CH, Paltoo DN, Papanicolaou GJ, Patterson N, Post WS, Psaty BM, Qasim AN, Qu L, Rader DJ, Redline S, Reilly MP, Reiner AP, Rich SS, Rotter JI, Liu Y, Shrader P, Siscovick DS, Tang WH, Taylor HA, Tracy RP, Vasan RS, Waters KM, Wilks R, Wilson JG, Fabsitz RR, Gabriel SB, Kathiresan S, Boerwinkle E. Genome-Wide Association Study of Coronary Heart Disease and Its Risk Factors in 8,090 African Americans: The NHLBI CARe Project. PLoS Genet. 2011 Feb 10;7(2):e1001300. PMCID: PMC3037413.
5. Duan Q, Liu EY, Auer PL, Zhang G, Lange EM, Jun G, Bizon C, Jiao S, Buyske S, Franceschini N, Carlson CS, Hsu L, Reiner AP, Peters U, Haessler J, Curtis K, Wassel CL, Robinson JG, Martin LW, Haiman CA, Le Marchand L, Matise TC, Hindorff LA, Crawford DC, Assimes TL, Kang HM, Heiss G, Jackson RD, Kooperberg C, Wilson JG, Abecasis GR, North KE, Nickerson DA, Lange LA, Li Y. Imputation of Coding Variants in African Americans: Better Performance using Data from the Exome Sequencing Project. Bioinformatics. 2013 Aug 16. [Epub ahead of print] PMID: 23956302
6. Bick AG, Flannick J, Ito K, Cheng S, Vasan RS, Parfenov MG, Herman DS, Depalma SR, Gupta N, Gabriel SB, Funke BH, Rehm HL, Benjamin EJ, Aragam J, Taylor HA Jr, Fox ER, Newton-Cheh C, Kathiresan S, O'Donnell CJ, Wilson JG, Altshuler DM, Hirschhorn JN, Seidman JG, Seidman C. Burden of rare sarcomere gene variants in the Framingham and Jackson Heart Study cohorts. Am J Hum Genet. 2012 Sep 7; 91(3):513-9. PMID: 22958901. PMCID:PMC3511985.
7. Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G, Kang HM, Jordan D, Leal SM, Gabriel S, Rieder MJ, Abecasis G, Altshuler D, Nickerson DA, Boerwinkle E, Sunyaev S, Bustamante CD, Bamshad MJ, Akey JM; Broad GO; Seattle GO; NHLBI Exome Sequencing Project. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012 Jul 6;337(6090):64-9. doi: 10.1126/science.1219240. Epub 2012 May 17. PMID: 22604720
8. Grove ML, Yu B, Cochran BJ, Haritunians T, Bis JC, Taylor KD, Hansen M, Borecki IB, Cupples LA, Fornage M, Gudnason V, Harris TB, Kathiresan S, Kraaij R, Launer LJ, Levy D, Liu Y, Mosley T, Peloso GM, Psaty BM, Rich SS, Rivadeneira F, Siscovick DS, Smith AV, Uitterlinden A, van Duijn CM, Wilson JG, O'Donnell CJ, Rotter JI, Boerwinkle E. Best Practices and Joint Calling of the HumanExome BeadChip: The CHARGE Consortium. PLoS One. 2013 Jul 12;8(7):e68095. doi: 10.1371/journal.pone.0068095. Print 2013. PMID: 23874508. PMCID: PMC3709915