Jackson Heart Study > Research > Study Design

GENETICS

JHS PRS Statement

Polygenic risk scores (PRS) can provide valuable information and ideally should be applicable across diverse populations. JHS investigators expect to contribute data to numerous efforts to develop PRS, including some that may be similar to each other in their target phenotypes, approach, and in the populations used for PRS development and validation. Thus, a decision to participate in one such effort will not represent a commitment of exclusivity. Rather, issues of potential overlap and redundancy will be managed through the JHS Genetics Working Group and the Publications and Presentations subcommittee in collaboration with the participating investigators. The management of overlap in genetic analyses other than the development of PRS is unchanged and also occurs through the JHS Genetics Working Group, the Publications and Presentations Subcommittee, and as appropriate, through TOPMed processes.

Genetics | Overview

Many features of each person’s appearance, behavior, and health are passed down from his or her ancestors. The material that carries this information from one generation to the next is called DNA, and DNA of different people can now be analyzed to find differences that affect health and disease. The Jackson Heart Study is the largest study in history to investigate the inherited (genetic) factors that affect high blood pressure, heart disease, strokes, diabetes and other important diseases in African Americans. DNA has been obtained from every consenting participant in the Jackson Heart Study, and is being analyzed for many thousands of differences between people that may affect their health. These studies are likely to lead to the development of new treatments that do more good and less harm than treatments that are available today.

Click here to view Genetics Brochure

Family | Studies

The Jackson Heart Study (JHS) Family Study is designed to detect new genes influencing the risk factors for a variety of heart, lung, and blood disorders. Enrollment of families was limited to the relatives of those who had already become JHS participants, so that the families would resemble the overall JHS population. This makes it more likely that results obtained in the family study will be meaningful for the greater JHS community.

Consent Tiers For Family Study Component Table (Updated: March 26, 2012)
Click here to view Table

Genetics | Research Overview

The Jackson Heart Study cohort of more than 5,300 participants includes a nested family cohort that was recruited from among the relatives of “index participants” who had at least two siblings and four other first-degree relatives residing in the recruitment area. Further, the general recruitment strategy for JHS was household-based, resulting in additional relatedness among study participants. Pedigree structure has been determined by incorporating both family history and molecular markers, and families vary in size and structure from sibling and parent-child pairs to large, multi-generational families. Overall there are 428 pedigrees with mean family size of 4.8, median family size of 3, and maximum family size of 32. The figure to the right shows the overall distribution of family size.

Number of members per family is shown on the x-axis and percent of families by size is shown on the y-axis.

Relationship | Pairs

DATA & SAMPLES

JHS data and samples have been incorporated into the work of multiple genetics consortia, and genotype and phenotype data are available through dbGaP via this link

dbGaPLINK

Genotype and sequencing data that have been obtained and are available either through dbGaP or by an approved data request to JHS are listed in the sections below. | Genetic analyses have been performed in accordance with participant consent. Thus the samples analyzed by the various platforms are largely overlapping.

Microsatellite markers – 1,486 members of 264 families. Marshfield Marker Set 16.

Ancestry Informative Marker Panel – (n=4,605)

Description: Approximately 1,500 genome wide markers that are highly differentiated in frequency in Europeans compared to Africans. These markers provide less information than Affymetrix 6.0 GWAS genotypes, but are available in a larger sample, perhaps allowing, for example, imputation of rare variants of interest into related individuals.
Methods: Genotyping methods and quality control are described in Nalls, et al.¹

IBC Cardiovascular Candidate Gene Array^{2, 3} – (n=2,948)

Genotyping was performed through NHLBI’s Candidate Gene Association Resource (CARe) consortium.
Description: Gene-centric array interrogating ~55,000 SNPs selected to tag 2,100 CVD candidate genes chosen based on biologic function, involvement in CVD-related Mendelian syndromes, GWAS results, and other criteria.
Methods: Design of the IBC Array is described in Keating et al²; and genotyping and quality control as well as organizational features of the CARe consortium are described in Musunuru, et al.³ A list of the genes and SNPs on the IBC Array is available on request.

Affymetrix 6.0 GWAS Genotyping – (n=3,029)

Genotyping was performed through NHLBI’s Candidate Gene Association Resource (CARe) consortium.³
Description: > 906,600 genome-wide tag SNPs and >946,000 probes for copy-number variation.
Methods: Genotyping and quality control are described in Lettre, et al.⁴ Data have been imputed to the 1000 Genomes phase 1 v3 reference panel as described in Duan, et al.⁵

1000 Genome Phase 3 - (n=3029).

1000G Phase 3 Imputed Data: VCF files of dosage and likely genotypes for autosomal imputed SNPs from 1000 Genomes Project (1000G) Phase 3 version 5 reference panel. Imputation was completed using Minimac3 on the Michigan Imputation Server (PMID 27571263). The reference panel includes 5,008 haplotypes from 26 populations across the world (http://www.internationalgenome.org). Prior to imputation, SNPs were filtered for minor allele frequency ≥1%, call rate ≥ 90%, HWE p-value > 10-6, as well as exclusion of sites with invalid or mismatched alleles for the reference panel.

1) JHS Sample size (N) = 3,029 (includes 9 samples that are recommended to be excluded, based on quality control issues such as sex or pedigree mismatches)
2) Total SNPs imputed = 49,143,605 (not filtered for imputation quality or minor allele count)

Targeted exome sequencing in 256 candidate genes – (n=1,963)

Sequencing was supported by the NHGRI sequencing centers in response to an application by Dr. Christine Seidman and others.
Description: Candidate genes were nominated by project investigators based on evidence (from Mendelian families, GWAS, etc.) of involvement in LV remodeling, diabetes, dyslipidemia, dysrhythmia, or hypertension. A custom capture array targeting exons of 256 candidate genes was developed. DNA of 1,637 members of the Framingham Offspring Cohort and 1,963 members of the Jackson Heart Study was sequenced. A list of the targeted genes is available.
Methods: Sequencing and quality control methods are described in Bick, et al.⁶

Exome sequencing – (n=3,374)

Exome sequencing of JHS samples has been performed under four separate projects. The total of 3,374 unique samples includes some samples that were sequenced in more than one project:

- Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples (T2D-GENES; NIDDK): n=1,036.

- NHLBI’s Exome Sequencing Project (ESP): n=1,518.

- Minority Health Genomics and Translational Research Bio-repository Database (MH-GRID; NHLBI): n=312.

- Cohorts for Heart and Aging Research in Genomic Epidemiology Sequencing Project (CHARGE-S; performed through the Atherosclerosis Risk in Communities [ARIC] study [NHLBI] among participants included in both JHS and ARIC): n=522.

Methods: Library preparation, target capture, sequencing, variant calling and quality control have been performed at the Broad Institute, the University of Washington, and the Baylor College of Medicine (CHARGE-S) using methods similar to those described for the Exome Sequencing Project (Tennessen et al⁷). Sample shotgun libraries were captured for exome enrichment using one of three in-solution capture products: CCDS 2008 (~26 Mb), Roche/Nimblegen SeqCap EZ Human Exome Library v1.0 (~32 Mb; Roche Nimblegen EZ Cap v1), or EZ Cap v2 (~34 Mb), and sequencing was performed on Illumina GAIIx or HiSeq 2000 machines.
Joint calling: Sequence data from the four projects listed above were called jointly in the Kathiresan Laboratory at the Broad Institute. Sequence data of all participants were aligned to a human reference genome (hg19) using the Burrows–Wheeler Aligner algorithm. Aligned non-duplicate reads were locally realigned and base qualities were recalibrated using the Genome Analysis ToolKit software. Variants were jointly called using the Genome Analysis ToolKit software and filtered using the Variant Quality Score Recalibration, quality over depth metrics, and strand bias among other metrics.

Exome Chip – (n=2,790)

Exome Chip genotyping was supported by R01HL107816 to S. Kathiresan.
Description: The Exome Chip (Illumina Human Exome BeadChip v. 1.0) was developed through the Exome Sequencing Project as a cost-effective method to follow up on low-frequency and rare coding variants observed in the ESP and other exome sequencing studies. Content of the chip was derived from the exomes of 12,031 samples from an array of projects, largely involving participants of European ancestry but also including ~2,000 African Americans.

Selected variants included (n=243,094 designed successfully):

- nonsynonymous variants

- splice variants, and

- stop gain/loss variants

- variants were observed in at least two studies, except 8,242 variants seen only once and included for ethnic diversity.

Additional content included (numbers represent variants that designed successfully):

- 5,325 GWAS top SNPs reported by the time of design

- a grid of common variants (n=5,286)

- 4,651 random synonymous variants (including 870 genotyped on both strands)

- 3,241 ancestry informative markers for African ancestry

- 998 ancestry informative markers for Native American ancestry

- 2,459 HLA tags

- 846 ESP “requests”

- 259 fingerprint SNPs

- 270 Micro RNA Target Sites

- 246 mitochondrial SNPs

- 128 Y chromosome markers

- 181 Indels

Methods: Genotyping, variant calling, and quality control were performed as described in Grove et al.⁸

DNA Methylation

DNA Methylation data in JHS was generated under ancillary study, ASN0104 (PI: Reiner A) for 1,757 samples mostly collected at baseline exam except for 6 participants that were drawn from exam 2. Illumina Methylation EPIC array data (containing over 850,000 CpG methylation sites) was generated at University of Washington, Seattle. Methylation β values (the ratio of intensities between methylated and un-methylated alleles) were normalized with respect to background color intensity using the normal-exponential out-of-band (NOOB) pre-processing method in the R package minfi [PMID 28035024]. Quality control checks performed using Horvath’s method [PMID: 24138928] identified outliers by hierarchical clustering approach as well as duplicates. The post QC participants were 1,752 aged on average 56±12 years old (range: 22-93 years) and 63% women (see Table).

study	visit	Label	N	Mean	Std Dev	Min	Max
ASN0104	1	Age in Years Body Mass Index (kg/m^2) Systolic Blood Pressure (mmHg)	1746 1745 1742	55.73 32.02 128.03	12.33 7.37 16.32	22.00 16.02 88.07	93.00 75.05 203.60
ASN0104	2	Age in Years Body Mass Index (kg/m^2) Systolic Blood Pressure (mmHg)	6 5 6	63.50 33.98 125.17	17.46 2.75 31.05	40.00 30.08 102.00	76.00 37.37 184.00

Whole Genome Sequencing – (n=3,406)

Description: Whole genome sequencing has been performed through NHLBI’s Trans-Omics for Precision Medicine (TOPMed) project at the direction of the Nickerson Laboratory at University of Washington. The TOPMed project include >100,000 samples from multiple cohorts, being sequenced at >30x depth of coverage with joint calling of all samples performed by the TOPMed Informatics Resource Center at the University of Michigan. Detailed methods are available at https://www.nhlbiwgs.org/data-sets.
Status: 3,406 JHS participants with consent for genetic data sharing through dbGaP have been successfully sequenced. Variant calls are available to qualified researchers through dbGaP at study accession phs000964.

Select Genetic Variants Available at JHS for Analysis

Note: Select genetic variants have been genotyped directly on commercial genotyping arrays such as the Exome Chip (Illumina Human Exome BeadChip v. 1.0), IBC Cardiovascular Candidate Gene Array, or Affymetrix 6.0 GWAS array, or assessed by Exome Sequencing (see references on the website). These include: APOL1 G1 and G2 variants, Duffy null variants of the DARC gene, Hemoglobin C, PSCK9 loss of function variants, sickle hemoglobin (rs334) and a functional SCN5A missense variant. Alpha thalassemia-associated deletions have been assessed from whole genome sequence. These data are NOT distributed with the VC package but are available to investigators with approved JHS manuscript proposals through the Data Cordinating Center. For additional details go the link provided.

I) APOL1: data on the derived allele of coding SNP rs73885319 (p.S342G) defines, together with the derived allele of coding SNP rs60910145 (p.I384M), the APOL1 G1 alleles (Apolipoprotein L-1 (APOL1) gene. The derived allele of indel rs71785313 (p.NYK388K) defines the APOL1 G2 allele. Base pair positions are: chr22:36265860(+) and chr22:36265988(+) for G1 variants and chr2:36266000(+) for the G2 deletion, based on GRCh38.p7 assembly. JHS Sample size 3224.

II) DUFFY: data on rs2814778 SNP (i.e. upstream-variant-2KB, utr-variant-5-prime) in Atypical Chemokine Receptor 1 ( Duffy Antigen Receptor for Chemokines [DARC]; Duffy Blood Group antigen). Base pair position is chr1: 159204893(-) based on GRCh38.p7 assembly. JHS sample size: 3027

III) HbC: data on rs33930165 SNP (i.e. reference, missense) in Hemoglobin Subunit Beta (HBB) gene. Base pair position is ch11: 5227003(-) based GRCh38.p7 assembly. Gives rise to rare form of hemoglobin ‘Hb C’. JHS sample size: 3027

IV) PSCK9: data on rs28362286 SNP (i.e. nc-transcript-variant, reference, stop-gained) in Proprotein convertase subtilisin / kexin type 9 (PSCK9) gene. Base pair location chr1:55063542(+) based on GRCh38.p7 assembly. JHS sample size: 3027

V) Sickle cell trait: data on rs334 SNP (i.e. reference, missense) in Hemoglobin Subunit Beta (HBB) gene. Base pair position chr11:5227002(-) based GRCh38.p7 assembly. This dataset contains sickle cell trait/disease SNP for 3224 JHS participants.

VI) SCN5A: data on rs7626962 SNP (i.e. intron-variant, reference, missense) in Sodium Voltage-Gated Channel Alpha Subunit 5 (SCN5A) gene. Base pair location chr3: 38579416(+) based on GRCh38.p7 assembly. JHS sample size: 3027

VII) TTR: Transthyretin gene and associated genetic variant or SNP (rs76992529), a coding sequence and a missense variant located on chr18 base pair position 31,598,655(+) based on GRCh38.p7 assembly. Results from a G to A transition at the a CG dinucleotide codon of the 122 amino of a mature TTR protein. A total of 127 JHS samples carry the minor allele (A) of 3,447 individuals genotyped. Genotyping in JHS and quality control detailed in a paper by Grove et al. (2013)

References:

1. Nalls MA, Wilson JG, Patterson NJ, Tandon A, Zmuda JM, Huntsman S, Garcia M, Hu D, Li R, Beamer BA, Patel KV, Akylbekova EL, Files JC, Hardy CL, Buxbaum SG, Taylor HA, Reich D, Harris TB, Ziv E. Admixture mapping of white cell count: genetic locus responsible for lower white blood cell count in the Health ABC and Jackson Heart studies. Am J Hum Genet. 2008 Jan;82(1):81-7. Erratum in: Am J Hum Genet. 2008 Feb;82(2):532. PMC2253985.

2. Keating BJ, Tischfield S, Murray SS, Bhangale T, Price TS, Glessner JT, Galver L, Barrett JC, Grant SF, Farlow DN, Chandrupatla HR, Hansen M, Ajmal S, Papanicolaou GJ, Guo Y, Li M, Derohannessian S, de Bakker PI, Bailey SD, Montpetit A, Edmondson AC, Taylor K, Gai X, Wang SS, Fornage M, Shaikh T, Groop L, Boehnke M, Hall AS, Hattersley AT, Frackelton E, Patterson N, Chiang CW, Kim CE, Fabsitz RR, Ouwehand W, Price AL, Munroe P, Caulfield M, Drake T, Boerwinkle E, Reich D, Whitehead AS, Cappola TP, Samani NJ, Lusis AJ, Schadt E, Wilson JG, Koenig W, McCarthy MI, Kathiresan S, Gabriel SB, Hakonarson H, Anand SS, Reilly M, Engert JC, Nickerson DA, Rader DJ, Hirschhorn JN, Fitzgerald GA. Concept, design and implementation of a cardiovascular gene-centric 50 k SNP array for large-scale genomic association studies. PLoS ONE. 2008;3(10):e3583. Epub 2008 Oct 31. PMC2571995.

3. Musunuru K, Lettre G, Young T, Farlow DN, Pirruccello JP, Ejebe KG, Keating BJ, Yang Q, Chen MH, Lapchyk N, Crenshaw A, Ziaugra L, Rachupka A, Benjamin EJ, Cupples LA, Fornage M, Fox ER, Heckbert SR, Hirschhorn JN, Newton-Cheh C, Nizzari MM, Paltoo DN, Papanicolaou GJ, Patel SR, Psaty BM, Rader DJ, Redline S, Rich SS, Rotter JI, Taylor HA Jr, Tracy RP, Vasan RS, Wilson JG, Kathiresan S, Fabsitz RR, Boerwinkle E, Gabriel SB; NHLBI Candidate Gene Association Resource. Candidate gene association resource (CARe): design, methods, and proof of concept. Circ Cardiovasc Genet. 2010 Jun 1;3(3):267-75. Epub 2010 Apr 17. PMID: 20400780.

4. Lettre G, Palmer CD, Young T, Ejebe KG, Allayee H, Benjamin EJ, Bennett F, Bowden DW, Chakravarti A, Dreisbach A, Farlow DN, Folsom AR, Fornage M, Forrester T, Fox E, Haiman CA, Hartiala J, Harris TB, Hazen SL, Heckbert SR, Henderson BE, Hirschhorn JN, Keating BJ, Kritchevsky SB, Larkin E, Li M, Rudock ME, McKenzie CA, Meigs JB, Meng YA, Mosley TH, Newman AB, Newton-Cheh CH, Paltoo DN, Papanicolaou GJ, Patterson N, Post WS, Psaty BM, Qasim AN, Qu L, Rader DJ, Redline S, Reilly MP, Reiner AP, Rich SS, Rotter JI, Liu Y, Shrader P, Siscovick DS, Tang WH, Taylor HA, Tracy RP, Vasan RS, Waters KM, Wilks R, Wilson JG, Fabsitz RR, Gabriel SB, Kathiresan S, Boerwinkle E. Genome-Wide Association Study of Coronary Heart Disease and Its Risk Factors in 8,090 African Americans: The NHLBI CARe Project. PLoS Genet. 2011 Feb 10;7(2):e1001300. PMCID: PMC3037413.

5. Duan Q, Liu EY, Auer PL, Zhang G, Lange EM, Jun G, Bizon C, Jiao S, Buyske S, Franceschini N, Carlson CS, Hsu L, Reiner AP, Peters U, Haessler J, Curtis K, Wassel CL, Robinson JG, Martin LW, Haiman CA, Le Marchand L, Matise TC, Hindorff LA, Crawford DC, Assimes TL, Kang HM, Heiss G, Jackson RD, Kooperberg C, Wilson JG, Abecasis GR, North KE, Nickerson DA, Lange LA, Li Y. Imputation of Coding Variants in African Americans: Better Performance using Data from the Exome Sequencing Project. Bioinformatics. 2013 Aug 16. [Epub ahead of print] PMID: 23956302

6. Bick AG, Flannick J, Ito K, Cheng S, Vasan RS, Parfenov MG, Herman DS, Depalma SR, Gupta N, Gabriel SB, Funke BH, Rehm HL, Benjamin EJ, Aragam J, Taylor HA Jr, Fox ER, Newton-Cheh C, Kathiresan S, O'Donnell CJ, Wilson JG, Altshuler DM, Hirschhorn JN, Seidman JG, Seidman C. Burden of rare sarcomere gene variants in the Framingham and Jackson Heart Study cohorts. Am J Hum Genet. 2012 Sep 7; 91(3):513-9. PMID: 22958901. PMCID:PMC3511985.

7. Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G, Kang HM, Jordan D, Leal SM, Gabriel S, Rieder MJ, Abecasis G, Altshuler D, Nickerson DA, Boerwinkle E, Sunyaev S, Bustamante CD, Bamshad MJ, Akey JM; Broad GO; Seattle GO; NHLBI Exome Sequencing Project. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012 Jul 6;337(6090):64-9. doi: 10.1126/science.1219240. Epub 2012 May 17. PMID: 22604720

8. Grove ML, Yu B, Cochran BJ, Haritunians T, Bis JC, Taylor KD, Hansen M, Borecki IB, Cupples LA, Fornage M, Gudnason V, Harris TB, Kathiresan S, Kraaij R, Launer LJ, Levy D, Liu Y, Mosley T, Peloso GM, Psaty BM, Rich SS, Rivadeneira F, Siscovick DS, Smith AV, Uitterlinden A, van Duijn CM, Wilson JG, O'Donnell CJ, Rotter JI, Boerwinkle E. Best Practices and Joint Calling of the HumanExome BeadChip: The CHARGE Consortium. PLoS One. 2013 Jul 12;8(7):e68095. doi: 10.1371/journal.pone.0068095. Print 2013. PMID: 23874508. PMCID: PMC3709915