Institute for Response-Genetics (e.V.), University of Zurich

Quantifying Genetic Similarity

Central to our oligogenic approach to quantifying genetic diversity is the similarity function that enables one to quantify the genetic distances d(x_i,x_j) between feature vectors x_i, x_j made up by the allelic patterns of any two subjects i, j at loci l₁, l₂, .. l_n. We use a nonmetric set-theoretical similarity measure that has primarily been designed to assess "Similarity By State" (SBS) [Goldstein et al. 1995; Slatkin 1995; Kimmel et al. 1996; Pritchard and Rosenberg 1999], yet also allows one to model "Similarity By Descent" (SBD) in a cross-sectional way. The measure is based on a stepwise mutation model of microsatellites [Kimmel et al. 1996; Chakraborty et al. 1997; Kimmel et al. 1998] and evaluates the fragment sizes [bp] of microsatellite alleles by analyzing the joint rectangular "areas" spanned by the 2 alleles of each of the microsatellites (multilocus "allelic patterns"). The overall similarity between the allelic patterns x_i, x_J of two subjects i, j is quantified through the set-theoretical intersection ("⋂": area shared by the two patterns) and the set-theoretical union ("⋃": total area involved):

with w_k designating the weight of the feature vector's k-th component, and X_.k the area spanned by the two alleles A_k1, A_k2 of the k-th component (0≤s≤1). The weights may either be set to unity or optimized by incorporating the allele frequencies of the general population, since concordance in a common allele may have less weight than concordance in a rare allele.

Interactions Between Genomic Loci

Interactions between genomic loci come into play in this model through the similarity function, since adding a single locus into the configuration may leave the genetic similarities unchanged, whereas pairs of loci may lead to a significant increase or decrease in genetic similarity. Significant variations in observed genetic similarity can be caused by polymorphisms which are either functional or located close to another functional polymorphism, with the notion "functional" also encompassing genetic mechanisms not yet discovered.

Biological Ethnicity

Subjects are represented in a genetic vector space as points and positioned in such a way that the mutual distances between subjects inversely correspond to the respective genetic similarity. Thus, genetically similar subjects form compact clouds ("clusters"), while genetically dissimilar subjects are located in more distant regions. Subjects’ genetic properties, as assessed through a set of polymorphisms, are represented by their coordinates in a multidimensional vector space. Figure 6 compares the genetic similarity of unrelated subjects with that derived from parent-offspring comparisons. The respective distributions typically differ by two standard deviations for feature vectors of length ≥ 15 and arbitrarily chosen microsatellites. The genetic vector space approach allows one, for example, to structurally decompose a genetically heterogeneous sample into more homogenous "natural" subgroups which give rise to the notion "biological ethnicity".

References

Chakraborty R, Kimmel M, Stivers DN, Davison LJ, Deka R (1997) Relative mutation rates at di-, tri-, and tetranucleotide microsatellite loci. Proc Natl Acad Sci USA 94: 1041-1046

Di Rienzo A, Donnelly P, Toomajian C, Sisk B, Hill A, Petzl-Erler ML, Haines GK, Barch DH (1998): Heterogeneity of microsatellite mutations within and between loci, and implications for human demographic histories. Genetics 148: 1269-1284

Goldstein DB, Ruiz Linares A, Cavalli-Sforza LL, Feldman LW (1995): An evaluation of genetic distances for use with microsatellite loci. Genetics 139: 463-471

Kimmel M, Chakraborty R, Stivers DN, Deka R (1996): Dynamics of repeat polymorphisms under a forward-backward mutation model: within- and between-population variability at microsatellite loci. Genetics 143: 549-555

Kimmel M, Chakraborty R, King JP, Bamshad M, Watkins WS, Jorde LB (1998): Signatures of population expansion in microsatellite repeat data. Genetics 148: 1921-1930

Pritchard JK, Rosenberg NA (1999): Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet 65: 220-228

Slatkin M (1995): A measure of population subdivision based on microsatellite allele frequencies. Genetics 139: 457-462

Stassen HH, Hoffmann K, Scharfetter C (2003) Similarity by state/descent and genetic vector spaces: Analysis of a longitudinal family study. In: Almasy L, Amos CI, Bailey-Wilson JE, Cantor RM, Jaquish CE, Martinez M, Neuman RJ, Olson JM, Palmer LJ, Rich SS, Spence MA, MacCluer JW (eds) Genetic Analysis Workshop 13: Analysis of longitudinal family data for complex diseases and related risk factors. BMC Genet 4: S59, 1-6

Institute for Response-Genetics (e.V.)

Chairman: Prof. Dr. Hans H. Stassen

Psychiatric Hospital (KPPP), University of Zurich

Quantifying Genetic Similarity

Interactions Between Genomic Loci

Biological Ethnicity

References