Institute for Response-Genetics (e.V.)

Prof. Dr. Hans H. Stassen, Chairman

(Formerly Associated Institute of the University of Zurich)

IFRG Emblem

Quantifying Genetic Similarity

"Similarity By State" versus "Similarity By Descent"

Central to our oligogenic approach to quantifying genetic diversity is the similarity function that enables one to quantify the genetic distances d(xi,xj) between feature vectors xi, xj made up by the allelic patterns of any two subjects i, j at loci l1, l2, .. ln. We use a nonmetric set-theoretical similarity measure that has primarily been designed to assess "Similarity By State" (SBS) [Goldstein et al. 1995; Slatkin 1995; Kimmel et al. 1996; Pritchard and Rosenberg 1999], yet also allows one to model "Similarity By Descent" (SBD) in a cross-sectional way. The measure is based on a stepwise mutation model of microsatellites [Kimmel et al. 1996; Chakraborty et al. 1997; Kimmel et al. 1998] and evaluates the fragment sizes [bp] of microsatellite alleles by analyzing the joint rectangular "areas" spanned by the 2 alleles of each of the microsatellites (multilocus "allelic patterns"). The overall similarity between the allelic patterns xi, xJ of two subjects i, j is quantified through the set-theoretical intersection ("⋂": area shared by the two patterns) and the set-theoretical union ("⋃": total area involved):

housholder formula

with wk designating the weight of the feature vector's k-th component, and X.k the area spanned by the two alleles Ak1, Ak2 of the k-th component (0≤s≤1). The weights may either be set to unity or optimized by incorporating the allele frequencies of the general population, since concordance in a common allele may have less weight than concordance in a rare allele.

Interactions Between Genomic Loci

Interactions between genomic loci come into play in this model through the similarity function, since adding a single locus into the configuration may leave the genetic similarities unchanged, whereas pairs of loci may lead to a significant increase or decrease in genetic similarity. Significant variations in observed genetic similarity can be caused by polymorphisms which are either functional or located close to another functional polymorphism, with the notion "functional" also encompassing genetic mechanisms not yet discovered.

Biological Ethnicity

Subjects are represented in a genetic vector space as points and positioned in such a way that the mutual distances between subjects inversely correspond to the respective genetic similarity. Thus, genetically similar subjects form compact clouds ("clusters"), while genetically dissimilar subjects are located in more distant regions. Subjects’ genetic properties, as assessed through a set of polymorphisms, are represented by their coordinates in a multidimensional vector space. Figure 6 compares the genetic similarity of unrelated subjects with that derived from parent-offspring comparisons. The respective distributions typically differ by two standard deviations for feature vectors of length ≥ 15 and arbitrarily chosen microsatellites. The genetic vector space approach allows one, for example, to structurally decompose a genetically heterogeneous sample into more homogenous "natural" subgroups which give rise to the notion "biological ethnicity" [Figure 8].


Chakraborty R, Kimmel M, Stivers DN, Davison LJ, Deka R (1997) Relative mutation rates at di-, tri-, and tetranucleotide microsatellite loci. Proc Natl Acad Sci USA 94: 1041-1046
Di Rienzo A, Donnelly P, Toomajian C, Sisk B, Hill A, Petzl-Erler ML, Haines GK, Barch DH (1998): Heterogeneity of microsatellite mutations within and between loci, and implications for human demographic histories. Genetics 148: 1269-1284
Goldstein DB, Ruiz Linares A, Cavalli-Sforza LL, Feldman LW (1995): An evaluation of genetic distances for use with microsatellite loci. Genetics 139: 463-471
Kimmel M, Chakraborty R, Stivers DN, Deka R (1996): Dynamics of repeat polymorphisms under a forward-backward mutation model: within- and between-population variability at microsatellite loci. Genetics 143: 549-555
Kimmel M, Chakraborty R, King JP, Bamshad M, Watkins WS, Jorde LB (1998): Signatures of population expansion in microsatellite repeat data. Genetics 148: 1921-1930
Pritchard JK, Rosenberg NA (1999): Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet 65: 220-228
Slatkin M (1995): A measure of population subdivision based on microsatellite allele frequencies. Genetics 139: 457-462
Stassen HH, Hoffmann K, Scharfetter C (2003) Similarity by state/descent and genetic vector spaces: Analysis of a longitudinal family study. In: Almasy L, Amos CI, Bailey-Wilson JE, Cantor RM, Jaquish CE, Martinez M, Neuman RJ, Olson JM, Palmer LJ, Rich SS, Spence MA, MacCluer JW (eds) Genetic Analysis Workshop 13: Analysis of longitudinal family data for complex diseases and related risk factors. BMC Genet 4: S59, 1-6


Quantifying Genetic Similarity
Genetic vector space spanned by 20 polymorphic markers on chromosomes 6, 11 and 22 reveals differences between ethnic groups: circles designate Afro Americans (n = 141), triangles NonAfro Americans (n = 111), and squares Swiss subjects (n = 257). Subjects are projected onto the hyperplane of defined through the eigenvectors associated with the 2 largest eigenvalues
[ Mail to Webmaster ]