Genetic Vector Spaces
Genetic Vector Spaces
Following a generalized haplotype model, subsets of loci are systematically combined into
multidimensional feature vectors which implicitly span a genetic vector space.
In fact, it follows from a mathematical theorem that the smallest Euclidean vector
space containing some set of vectors can be constructed given merely their mutual distances
[Young and Housholder 1938]. The Housholder-Torgerson formula
gives a routine method for computing directly from the inter-individual genetic distances djk
a matrix (bjk) of scalar products between points using the points’ centroid as origin
[Torgerson 1985]. This matrix is then factored by any of the usual factoring procedures to obtain the
projections of the n points onto the r ≤ n orthogonal axes of the genetic vector space. Once the
factor matrix has been obtained, it may be rotated and translated to a genetically meaningful
set of dimensions through the use of suitable criteria, such as (1) maximal inter-individual
scattering, (2) maximal variance explained by "n" eigenvectors, (3) decomposition of sample
into "natural" subgroups, or (4) maximal correlation with quantitative phenotype, amongst others.
Nonmetric Multidimensional Scaling
Subjects are represented in a genetic vector space as points and positioned in such a way
that the mutual distances between subjects inversely corresponds to the respective genetic similarity
In other words, any distance function suitable for quantifying the genetic similarity between
multidimensional allelic patterns can be used to construct a genetic vector space where genetically
similar subjects form compact clouds ("clusters"), while genetically dissimilar subjects are located
in more distant regions. Subjects’ genetic properties, as assessed through a set of candidate genes,
are represented by their coordinates in a multidimensional vector space. Comparing the
genetic similarity of unrelated subjects with that derived from parent-offspring comparisons
leads to distributions that typically differ by two standard deviations (feature vectors of
length ≥ 15 and sufficiently polymorphic microsatellites).
For nonmetric genetic distances, as is the case with set-theoretical similarity or distance measures,
the Housholder-Torgerson formula yields only an "approximative" vector space to be iteratively
"fitted" by means of a nonmetric multidimensional scaling (NMDS) procedure in order to derive a
sufficiently accurate Euclidean representation of the underlying genetic vector space.
Several different optimization criteria are in use [Davison 1983].
Genotype-Phenotype Correlations
The genetic vector space approach allows one, for example, to structurally decompose a genetically
heterogeneous sample into more homogenous "natural" subgroups which give rise to the notion
"biological ethnicity". Figure 9 shows a bimodal distribution of inter-individual genetic distances
derived from unrelated subjects, which indicates the existence of at least two subgroups.
By means of the theory of genetic vector spaces, the question of oligogenic association between
genotype and vulnerability to functional psychoses becomes whether or not there exist configurations
of candidate genes that span genetic vector spaces in which subjects with similar combinations of
psychopathological syndromes (similar patterns of syndrome scores) — or similar time
characteristics of psychotropic drug response — are located close to each other. In fact,
such configurations give rise to significant correlations between the subjects’ "coordinates" in
the genetic vector space (quantitative genotype) and psychopathological syndrome scores and the
time characteristics of psychotropic drug response (quantitative phenotype). It is particularly
worth noting that the genetic vector space approach enables a "true" genotype-to-phenotype research
strategy in so far as the observed structure on the genotype level may be linked to features on
the phenotype level to "explain" this structure.
References
Davison ML (1983) Multidimensional Scaling, Nonmetric group solutions. Wiley, p 82-120
Stassen HH, Hoffmann K, Scharfetter C (2003) Similarity by state/descent and genetic vector spaces:
Analysis of a longitudinal family study. In: Almasy L, Amos CI, Bailey-Wilson JE, Cantor RM,
Jaquish CE, Martinez M, Neuman RJ, Olson JM, Palmer LJ, Rich SS, Spence MA, MacCluer JW (eds)
Genetic Analysis Workshop 13: Analysis of longitudinal family data for complex diseases and
related risk factors. BMC Genet 4: S59, 1-6
Torgerson WS (1985) Theory and methods of scaling. R.E. Krieger Publishing Company, Malabar/Florida
Young G, Housholder AS (1938) Discussion of a set of points in terms of their mutual distances.
Psychometrika 3(1): 19-22