Citation
Katki, H. A., Sanders, C. L., Graubard, B. I., & Bergen, A. W. (2010). Using DNA fingerprints to infer familial relationships within NHANES III households. Journal of the American Statistical Association, 105(490), 552-563.
Abstract
Developing, targeting, and evaluating genomic strategies for population-based disease prevention require population-based data. In response to this urgent need, genotyping has been conducted within the Third National Health and Nutrition Examination (NHANES III), a nationally representative household-interview health survey. However, before these genetic analyses can occur, family relationships within households must be accurately ascertained. Unfortunately, reported family relationships within NHANES III households based on questionnaire data are incomplete and inconclusive with regard to actual biological relatedness of family members. We inferred family relationships within households using DNA fingerprints (Identifiler®) that contain the DNA loci used by law enforcement agencies for forensic identification of individuals. The performance of these loci for relationship inference is not well understood, however. We evaluated two competing statistical methods for relationship inference on pairs of household members: an exact likelihood ratio relying on allele frequencies to an identical-by-state (IBS) likelihood ratio that only requires matching alleles. We modified these methods to account for genotyping errors and population substructure. The two methods usually agree on the rankings of the most likely relationships; however, the IBS method underestimates the likelihood ratio by not accounting for the informativeness of matching rare alleles. The likelihood ratio is sensitive to estimates of population substructure, and parent–child relationships are sensitive to the specified genotyping error rate. These loci were unable to distinguish second-degree relationships and cousins from being unrelated. The genetic data also are useful for verifying reported relationships and identifying data quality issues. An important byproduct is the first explicitly nationally representative estimates of allele frequencies at these ubiquitous forensic loci.
Keywords: Allele sharing, Combined DNA index system, Forensics, Identical by descent, Identical by state, Population structure