Speech characteristics can be roughly described by a few major features: speech flow, loudness,
intonation and intensity of overtones. Speech flow describes the speed at which utterances are
produced as well as the number and duration of temporary breaks in speaking. Loudness reflects
the amount of energy associated with the articulation of utterances and, when regarded as a
time-varying quantity, the speaker's dynamic expressiveness. Intonation is the manner of producing
utterances with respect to rise and fall in pitch, and leads to tonal shifts in either
direction of the speaker's mean vocal pitch. Overtones are the higher tones which faintly
accompany a fundamental tone, thus being responsible for the tonal diversity of sounds.
Our approach to analyzing the nonverbal information contained in human speech is based on the
results of a normative study on 192 healthy volunteers. The design of this study, with three
different types of text and two repeated measurements at 14 day intervals, was chosen to
investigate the reproducibility of speech parameters over time, and to analyze the sensitivity
of speech parameters with respect to form and content of spoken text. In detail, we determined
(1) the optimum recording time required for a reliable estimation of speech parameters, (2)
the distribution of speech parameters in the general population, (3) the intra-individual stability
of speech parameters over time which allows one to distinguish between "natural" fluctuations
and "significant" changes, (4) the differences between dialect and non-dialect, and between
affect-neutral and affect-charged speech, (5) the amount of variance explained by the factors age,
gender and social status. Within the scope of this normative study we developed a practical
recording procedure that can be carried out routinely by a technician in a standardized setting.
All speech signals are inspected visually and marked with an artifact code if necessary, so that
disturbed intervals can be removed prior to data analysis. In a next step, segmentation tables are
set up in order to identify pauses and utterances, whereby pauses of less than 250 msec duration
are skipped. Subsequently to this, we calculate "spectra" on the basis of 1-second epochs by means
of a discrete Fourier transformation ("pure" utterances with pauses having been eliminated for
spectral analysis). Finally, we approximate the shape of the F0 distribution curve ("F0" designates
the mean vocal pitch of a speaker) by a 2nd degree polynomial and use the distance between the
symmetrical 6-dB points as a measure for the "F0-variability" (intonation). The ratio height/width
of the 2nd degree polynomial serves as a measure of the "F0-narrowness" (monotony). All frequency
differences are calculated in quartertones in order to allow direct comparison between speakers
independently of the speakers' mean vocal pitch.
Braun S, Annovazzi C, Botella C, Bridler B, Camussi E, Delfino JP, Mohr C, Moragrega I, Papagno C,
Pisoni A, Soler C, Seifritz E, Stassen HH: Assessing Chronic Stress, Coping Skills and Mood Disorders
through Speech Analysis. A Self-Assessment "Voice App" for Laptops, Tablets, and Smartphones.
Psychopathology 2016; 49(6): 406-419
[
get the article]
Delfino JP, Barragán E, Botella C, Braun S, Bridler R, Camussi E, Chafrat V, Lott P, Mohr C,
Moragrega I, Papagno C, Sanchez S, Seifritz E, Soler C, Stassen HH: Quantifying Insufficient
Coping Behavior under Chronic Stress. A cross-cultural study of 1,303 students from Italy,
Spain, and Argentina. Psychopathology 2015; 48: 230-239
Braun S, Botella C, Bridler R, Chmetz F, Delfino JP, Herzig D, Kluckner VJ, Mohr C, Moragrega I, Schrag Y,
Seifritz E, Soler C, Stassen HH: Affective State and Voice: Cross-Cultural Assessment of Speaking Behavior and
Voice Sound Characteristics. A Normative Multi-Center Study of 577+36 Healthy Subjects. Psychopathology 2014;
47(5): 327-340
Mohr C, Braun S, Bridler R, Chmetz F, Delfino JP, Kluckner VJ, Lott P, Schrag Y, Seifritz E, Stassen HH:
Insufficient Coping Behavior under Chronic Stress and Vulnerability to Psychiatric Disorders.
Psychopathology 2014; 47: 235-243
Stassen HH, Delfino JP, Kluckner VJ, Lott P, Mohr C: Vulnerabilität und psychische Erkrankung. Swiss Archives
of Neurology and Psychiatry 2014; 165(5): 152-157
Stassen HH (2004) Veränderungen der Sprechmotorik. In: T.Jahn (ed) Bewegungsstörungen bei psychischen
Erkrankungen. Springer Heidelberg: 107-125
Stassen HH, Angst J (2002) Wirkung und Wirkungseintritt in der Antidepressiva-Behandlung. In: Böker H and
Hell D (eds) Therapie der affektiven Störungen. Stuttgart und New York: Schattauer 141-165
Lott PR, Guggenbühl S, Schneeberger A, Pulver AE, Stassen HH (2002) Linguistic analysis of the speech
output of schizophrenic, bipolar, and depressive patients. Psychopathology 35(4): 220-227
Püschel J., Stassen HH, Bomben G, Scharfetter C and Hell D (1998) Speaking behavior and voice sound
characteristics in acute schizophrenia. J. Psychiatric Research 32, 89-97
Stassen HH, Kuny S, Hell D (1998) The speech analysis approach to determining onset of
improvement under antidepressants. Eur. Neuropsychopharmacology 8(4), 303-310
Kuny S, Stassen HH, Hell D (1997) Kognitive Beeinträchtigungen in der Depression.
Schweiz Arch Neurol Psychiatrie 150,3: 18-25
Stassen HH (1995) Affekt und Sprache. Stimm- und Sprachanalysen bei Gesunden, depressiven und
schizophrenen Patienten. Monographien aus dem Gesamtgebiete der Psychiatrie, Bd. 79. Berlin, Heidelberg: Springer
Stassen HH, Albers M, Püschel J, Scharfetter C, Tewesmeier M, Woggon B (1995) Speaking
behavior and voice sound characteristics associated with negative schizophrenia. J Psychiat Res. 29, 277-296
Kuny S, Stassen HH (1993) Speaking behavior and voice sound characteristics in depressive patients
during recovery. J Psychiat Res. 27, 289-307