Quantifying Genetic Similarity
Central to our oligogenic approach to quantifying genetic diversity is the similarity function that
enables one to quantify the genetic distances d(xi,xj) between feature vectors
xi, xj made up by the allelic patterns of any two subjects i, j at loci
l1, l2, .. ln. We use a nonmetric set-theoretical
similarity measure that has primarily been designed to assess "Similarity By State" (SBS) [Goldstein
et al. 1995; Slatkin 1995; Kimmel et al. 1996; Pritchard and Rosenberg 1999], yet also allows one
to model "Similarity By Descent" (SBD) in a cross-sectional way. The measure is based on a stepwise
mutation model of microsatellites [Kimmel et al. 1996; Chakraborty et al. 1997; Kimmel et al. 1998]
and evaluates the fragment sizes [bp] of microsatellite alleles by analyzing the joint rectangular
"areas" spanned by the 2 alleles of each of the microsatellites (multilocus "allelic patterns"). The
overall similarity between the allelic patterns xi, xJ of two subjects i, j
is quantified through the set-theoretical intersection ("⋂": area shared by the two patterns) and
the set-theoretical union ("⋃": total area involved):
with wk designating the weight of the feature vector's k-th component, and X.k
the area spanned by the two alleles Ak1, Ak2 of the k-th component (0≤s≤1).
The weights may either be set to unity or optimized by incorporating the allele frequencies of the
general population, since concordance in a common allele may have less weight than concordance in a
rare allele.
Interactions Between Genomic Loci
Interactions between genomic loci come into play in this model through the similarity function,
since adding a single locus into the configuration may leave the genetic similarities unchanged,
whereas pairs of loci may lead to a significant increase or decrease in genetic similarity.
Significant variations in observed genetic similarity can be caused by polymorphisms which are
either functional or located close to another functional polymorphism, with the notion "functional"
also encompassing genetic mechanisms not yet discovered.
Biological Ethnicity
Subjects are represented in a genetic vector space as points and positioned in such a way that the
mutual distances between subjects inversely correspond to the respective genetic similarity. Thus,
genetically similar subjects form compact clouds ("clusters"), while genetically dissimilar subjects
are located in more distant regions. Subjects’ genetic properties, as assessed through a set of
polymorphisms, are represented by their coordinates in a multidimensional vector space. Figure 6
compares the genetic similarity of unrelated subjects with that derived from parent-offspring
comparisons. The respective distributions typically differ by two standard deviations for feature
vectors of length ≥ 15 and arbitrarily chosen microsatellites.
The genetic vector space approach allows one, for example, to structurally decompose a genetically
heterogeneous sample into more homogenous "natural" subgroups which give rise to the notion
"biological ethnicity".
References
Chakraborty R, Kimmel M, Stivers DN, Davison LJ, Deka R (1997) Relative mutation rates at di-, tri-,
and tetranucleotide microsatellite loci. Proc Natl Acad Sci USA 94: 1041-1046
Di Rienzo A, Donnelly P, Toomajian C, Sisk B, Hill A, Petzl-Erler ML, Haines GK, Barch DH (1998):
Heterogeneity of microsatellite mutations within and between loci, and implications for human demographic
histories. Genetics 148: 1269-1284
Goldstein DB, Ruiz Linares A, Cavalli-Sforza LL, Feldman LW (1995): An evaluation of genetic distances
for use with microsatellite loci. Genetics 139: 463-471
Kimmel M, Chakraborty R, Stivers DN, Deka R (1996): Dynamics of repeat polymorphisms under a
forward-backward mutation model: within- and between-population variability at microsatellite loci.
Genetics 143: 549-555
Kimmel M, Chakraborty R, King JP, Bamshad M, Watkins WS, Jorde LB (1998): Signatures of population
expansion in microsatellite repeat data. Genetics 148: 1921-1930
Pritchard JK, Rosenberg NA (1999): Use of unlinked genetic markers to detect population stratification
in association studies. Am J Hum Genet 65: 220-228
Slatkin M (1995): A measure of population subdivision based on microsatellite allele frequencies.
Genetics 139: 457-462
Stassen HH, Hoffmann K, Scharfetter C (2003) Similarity by state/descent and genetic vector spaces:
Analysis of a longitudinal family study. In: Almasy L, Amos CI, Bailey-Wilson JE, Cantor RM,
Jaquish CE, Martinez M, Neuman RJ, Olson JM, Palmer LJ, Rich SS, Spence MA, MacCluer JW (eds)
Genetic Analysis Workshop 13: Analysis of longitudinal family data for complex diseases and
related risk factors. BMC Genet 4: S59, 1-6