Genetic Vector Spaces

Following a generalized haplotype model, subsets of loci are systematically combined into multidimensional feature vectors which implicitly span a genetic vector space. In fact, it follows from a mathematical theorem that the smallest Euclidean vector space containing some set of vectors can be constructed given merely their mutual distances [Young and Housholder 1938]. The Housholder-Torgerson formula

gives a routine method for computing directly from the inter-individual genetic distances d_jk a matrix (b_jk) of scalar products between points using the points’ centroid as origin [Torgerson 1985]. This matrix is then factored by any of the usual factoring procedures to obtain the projections of the n points onto the r ≤ n orthogonal axes of the genetic vector space. Once the factor matrix has been obtained, it may be rotated and translated to a genetically meaningful set of dimensions through the use of suitable criteria, such as (1) maximal inter-individual scattering, (2) maximal variance explained by "n" eigenvectors, (3) decomposition of sample into "natural" subgroups, or (4) maximal correlation with quantitative phenotype, amongst others.

Nonmetric Multidimensional Scaling

Subjects are represented in a genetic vector space as points and positioned in such a way that the mutual distances between subjects inversely corresponds to the respective genetic similarity In other words, any distance function suitable for quantifying the genetic similarity between multidimensional allelic patterns can be used to construct a genetic vector space where genetically similar subjects form compact clouds ("clusters"), while genetically dissimilar subjects are located in more distant regions. Subjects’ genetic properties, as assessed through a set of candidate genes, are represented by their coordinates in a multidimensional vector space. Comparing the genetic similarity of unrelated subjects with that derived from parent-offspring comparisons leads to distributions that typically differ by two standard deviations (feature vectors of length ≥ 15 and sufficiently polymorphic microsatellites). For nonmetric genetic distances, as is the case with set-theoretical similarity or distance measures, the Housholder-Torgerson formula yields only an "approximative" vector space to be iteratively "fitted" by means of a nonmetric multidimensional scaling (NMDS) procedure in order to derive a sufficiently accurate Euclidean representation of the underlying genetic vector space. Several different optimization criteria are in use [Davison 1983].

Genotype-Phenotype Correlations

The genetic vector space approach allows one, for example, to structurally decompose a genetically heterogeneous sample into more homogenous "natural" subgroups which give rise to the notion "biological ethnicity". Figure 9 shows a bimodal distribution of inter-individual genetic distances derived from unrelated subjects, which indicates the existence of at least two subgroups. By means of the theory of genetic vector spaces, the question of oligogenic association between genotype and vulnerability to functional psychoses becomes whether or not there exist configurations of candidate genes that span genetic vector spaces in which subjects with similar combinations of psychopathological syndromes (similar patterns of syndrome scores) — or similar time characteristics of psychotropic drug response — are located close to each other. In fact, such configurations give rise to significant correlations between the subjects’ "coordinates" in the genetic vector space (quantitative genotype) and psychopathological syndrome scores and the time characteristics of psychotropic drug response (quantitative phenotype). It is particularly worth noting that the genetic vector space approach enables a "true" genotype-to-phenotype research strategy in so far as the observed structure on the genotype level may be linked to features on the phenotype level to "explain" this structure.

References

Davison ML (1983) Multidimensional Scaling, Nonmetric group solutions. Wiley, p 82-120

Stassen HH, Hoffmann K, Scharfetter C (2003) Similarity by state/descent and genetic vector spaces: Analysis of a longitudinal family study. In: Almasy L, Amos CI, Bailey-Wilson JE, Cantor RM, Jaquish CE, Martinez M, Neuman RJ, Olson JM, Palmer LJ, Rich SS, Spence MA, MacCluer JW (eds) Genetic Analysis Workshop 13: Analysis of longitudinal family data for complex diseases and related risk factors. BMC Genet 4: S59, 1-6

Torgerson WS (1985) Theory and methods of scaling. R.E. Krieger Publishing Company, Malabar/Florida

Young G, Housholder AS (1938) Discussion of a set of points in terms of their mutual distances. Psychometrika 3(1): 19-22

Professor Dr. med. Christian Scharfetter

Dept. of Psychiatry, Psychotherapy & Psychosomatics

Psychiatric Hospital, University of Zurich

Genetic Vector Spaces