|
SEGM — Segmentation of speech recordings in pauses/utterances
This program is used to "learn" the characteristics of a speech
signal recorded under given experimental conditions. The algorithm
expects a pause of at least 100 msec duration in the very beginning
of the recording in order to calibrate the signal amplification
with respect to the actual background noise level (optimization of
the signal-to-noise ratio). Once the calibration parameters are
determined, the entire recording is processed in order to identify
pauses and utterances, as well as artifacts. The "goodness" of
the resulting segmentation can be tested through the plot option.
Specificationlist: SEGM
------------------------------
A8 TANA Undefined
I4 NREC 0 Default-value
I4 NBIT 16 Default-value
I4 IBM 0 Default-value
I4 PROT 0 Default-value
I4 PLOT 0 Default-value
I4 PANF 1 Default-value
I4 PEND 10 Default-value
R4 AMPL 1.000 Default-value
I4 NGL 0 Default-value
I4 SCUT 7.500 Default-value
I4 LG30 30 Default-value
I4 LPRT 6 Default-value
I4 PROB 0 Default-value
I4 SAVE 0 Default-value
01 TAPE Name of input tape
02 NREC Number of files to be processed
03 NBIT Specifies number of bits used in A/D-conversion
04 IBM Specifies data format (IBM/SPARC vs. X86/X64)
05 PROT Controls graphic output
06 PLOT Controls graphic output
07 PANF Specifies first page to be plotted
08 PEND Specifies last page to be plotted
09 AMPL Specifies amplification of signal prior to plotting
10 NGL Filter parameter for segmentation (smoothness)
11 SCUT Filter parameter for segmentation (flanks)
12 LG30 Logical unit number associated with input tape
13 LPRT Logical unit number of plot-device
14 PROB Specifies proband to be processed
15 SAVE Saves newly constructed segmentation table
16 DEMO Examples that illustrate program function
- TANA: Name of the input tape ("NO" means that
DSN-name will not be checked)
- NREC = 0: All files
= n: First "n" files are to be processed
- NBIT = n: Number of bits used in A/D-conversion (default=16)
- IBM = 0: Input data are stored in X86/X64-mode (little endian)
= 1: Input data are stored in IBM/SPARC-mode (big endian)
- PROT = 0: No protocol
= 1: Short protocol
= 2: Detailed protocol
- PLOT < 0: Segmentation marks are written to unit "LPRT"
= 0: No plots
= 1: Time series
= 2: Time series with pre-existing segmentation marks
= 3: Envelope curves
= 4: Envelope curves with pre-existing segmentation marks
= 5: Envelope curves with newly computed segmentation marks
- PANF = p: Specifies first page to be plotted
- PEND = p: Specifies last page to be plotted
- AMPL = r: Time series are amplified by factor "r" prior to
plotting
- NGL = 0: Filter parameter is automatically selected
= q: Filter parameter is explicitly specified
- SCUT = 0: Filter parameter is automatically selected
= q: Filter parameter is explicitly specified
- LG30 = u: Logical unit number associated with input tape
- LPRT: Logical unit number of plot-device (standard=6;
valid numbers are 46-96)
- PROB = 0: All probands
= n: Only proband "n" is to be processed
- SAVE = 0: No effect
> 0: Newly constructed segmentation table will be stored
- DEMO: Segmentation of speech signals (marked by crosses)
Example
&&START SEGM=Segmentation of re-formatted bli-tapes (stud600)
TANA=NO,LG30=35,PLOT=5,PEND=7,LPRT=60,SCUT=5
|
|
Fig. 20: Our segmentation algorithm determines, in a first step,
the intensity of background noise. Once the noise level has been
determined, the algorithm subdivides the speech signal into
utterances and pauses using the noise level as threshold. Mean
pause duration and utterance duration are used to test the
hypothesis that patients speak more slowly during depression
than they do after recovery.
|