Predicting IgM Levels from Multilocus Genotype
A genetically predisposed aberrancy of the inflammatory response system has been linked
to various complex diseases. In consequence, of particular clinical interest are
"objective" classifiers that enable reliable prediction, on the one hand, and offer the
opportunity for early intervention prior to the onset of clinical manifestations, on the
other. To investigate the extent to which IgM levels can be reproducibly predicted for
each individual patient from his/her multilocus genotype, we carried out a Neural Network
(NN) analysis on a sufficiently large sample (n=1,042; genotyped for 5,728 SNPs of a
conventionally designed 0.4 Mb genome scan) under the constraint of a 10-fold
cross-validation. Since NN results tend to be over-optimistic, even when using stringent
cross-validation approaches, we were interested in the reproducibility of predictors
across populations ("training" versus "test" samples) and across SNP sets (conventionally
designed genome scan versus anonymous 500k-chip). To address these questions, we relied
on independent test samples (n=746; genotyped for 545,080 SNPs of a 500k-chip) along with
6 different SNP sets, each with 5,728 SNPs drawn from the 500k-chip under the constraint
of maximum informativeness and compatibility with the training SNPs.
SNP Selection for Cross-Validation
Based on NCBI36 data, the coordinates X(k) of the 5,728 SNPs of our training sample were
used to define surrounding X(k)±0.1 Mb intervals (k=1,2,.. 5,728). Typically 50-80 SNPs of
the 500k-chip were located in these intervals and served as pool for selecting 8 "optimal"
SNPs in terms of informativeness and vicinity to the original loci at X(k) (k=1,2,.. 5,728).
Finally, 6 subsets of 5,728 SNPs each were constructed by randomly combining SNPs from
each interval [Figure]. This process led to mutual overlaps between the 6 subsets in the
range of 14.6-16.6%. Due to missing data typically 40 SNPs (0.7%) of the resulting sets
had to be excluded from analysis, so that on average only 5,688 SNPs were available in
each set for testing.
Reproducibility of Multilocus Configuration
In terms of clusters of at least 3 SNPs within a 0.5 Mb region, the training step yielded
a configuration of 15 genomic loci (61 SNPs) that served as reference for subsequent
investigations into the reproducibility of classifiers across populations and SNP sets.
Yet unexpectedly, the same algorithm applied to the 746 test samples with 6 competitive
SNP sets, typically yielded relatively reproducible results for 4 out of the 6 SNP sets,
whereas the results of the 2 other SNP sets pretty consistently turned out to be largely
arbitrary. Given current results, no more than 5 of 15 genomic loci derived from the
training samples appear to be reproducible through the test samples and independent of
SNP sets.
References
Stassen HH, Szegedi A, Scharfetter C: Modeling Activation of Inflammatory Response
System. A Molecular-Genetic Neural Network Analysis. BMC Proceedings 2007, 1
(Suppl 1): S61, 1-6
Stassen HH, Anghelescu IG, Hell D, Hoffmann K, Rujescu D, Scharfetter C, Szegedi A,
Tadic A: Linking autoantibody formation to genetic vulnerability to psychiatric disorders
and psychotropic drug response. Int J Neuropsychopharmacol. 2008; 11 (Suppl. 1): 101
Stassen HH, Hoffmann K, Scharfetter C: The Difficulties of Reproducing Conventionally
Derived Results through 500k-Chip Technology. BMC Genet 2009; 3 Suppl 7: S66
Stassen HH, Braun S, Bridler R, Seifritz E, Weisbrod M: Inflammatory Processes and
Schizophrenia: Evidence from a Twin Study. Eur Neuropsychopharmacology 2017;
27 Suppl 4: S934-S935
Braun S, Bridler R, Müller N, Schwarz MJ, Seifritz E, Weisbrod M, Zgraggen A, Stassen HH:
Inflammatory Processes and Schizophrenia: Two Independent Lines of Evidence from a Study
of Twins Discordant and Concordant for Schizophrenic Disorders. European Archives of
Psychiatry and Clinical Neuroscience 2017; 267: 377-389
Stassen HH: Heterogeneity of schizophrenic disorders and link to chronically elevated
IgM values. Neurology, psychiatry and brain research 2018; 29: 23-24