Institute for Response-Genetics (e.V.), University of Zurich

Quantifying Genetic Diversity

Our approach to quantifying the genetic diversity associated with a catalog of genes relies on genetic "vectors" which are assembled per gene from the genotypes of 4-8 polymorphic SNPs located within the genes under investigation. Within the scope of this study we used 100 specifically selected genes that had previously been hypothesized to be relevant in the context of psychiatric disorders. Specifically, "m" SNPs per gene will result in 2**m-dimensional genetic vectors, where the length of the vectors can vary from gene to gene. The maximum possible number of genotypes for a gene with "m" SNPs is then 4**m. However, because SNPs located within a gene are in many cases strongly correlated, the actual number of different genotypes observed in the population of interest is much smaller. It depends on the particular gene, on the SNPs chosen to make up the respective genetic vector, as well as on the population studied (number of observations, biological ethnicity). When used to resolve subtle differences in population structure, a gene with a large number of observable genotypes is more informative than a gene with just a few genotypes. In other words, variation means information. The genotypic diversity in our study was found to be almost infinite (in the order of 100**100), so that it was not at all straightforward to establish the anticipated link with psychiatric disorders..

Learning to Recognize

Once a set of genetic vectors is available for sufficiently representative samples of the populations under investigation, methods of Artificial Intelligence (AI) can be used in order to detect genotype patterns that are unique to a population and contribute to discrimination between populations ("supervised learning"). Likewise, the same methodological framework can be used to develop a model of biological ethnicity ("unsupervised learning"). There is a critically important caveat: the genetic vector method is very sensitive to missing data in the SNPs, as these cause the "noise level" to increase unacceptably after a certain point.

Normative Data

When comparing populations in terms of genetic diversity, it is essential that the results are corrected for any differences in sample size. We created the prerequisite for such corrections by analyzing our total sample (n=1,698) with respect to genetic diversity using 32-fold repeated random sampling for subsamples of size 50 - 1,500 and in steps of 50. The following Table shows the expected values regarding genetic diversity for 10 genes and sample sizes ranging from 100 to 1,000. Due to the well-behaved characteristics of the underlying functions, extrapolation is possible for population sizes beyond n=1,698.

Genetic Diversity as a Function of Population Size

References

Stassen HH, Bridler R, Hell D, Weisbrod M, Scharfetter C: Ethnicity-independent genetic basis of functional psychoses. A Genotype-to-phenotype approach. Am J Med Genetics B 2004; 124: 101-112

Berger M, Stassen HH, Köhler K, Krane V, Mönks D, Wanner C, Hoffmann K, Hoffmann MM, Zimmer M, Bickeböller H, Lindner TH: Hidden population substructures in an apparently homogeneous population bias association studies. Eur J Hum Genetics 2006; 14: 236-244

Stassen HH, Hoffmann K, Scharfetter C: The Difficulties of Reproducing Conventionally Derived Results through 500k-Chip Technology. BMC Genet Proc. 2009; 3 Suppl 7: S66

Institute for Response-Genetics (e.V.)

Chairman: Prof. Dr. Hans H. Stassen

Psychiatric Hospital (KPPP), University of Zurich

Quantifying Genetic Diversity

Learning to Recognize

Normative Data

References