Institute for Response-Genetics (e.V.)

Chairman: Prof. Dr. Hans H. Stassen

Psychiatric Hospital (KPPP), University of Zurich

IFRG Logo
IFRG Emblem
Trithemius

SEGM — Segmentation of speech recordings in pauses/utterances

This program is used to "learn" the characteristics of a speech signal recorded under given experimental conditions. The algorithm expects a pause of at least 100 msec duration in the very beginning of the recording in order to calibrate the signal amplification with respect to the actual background noise level (optimization of the signal-to-noise ratio). Once the calibration parameters are determined, the entire recording is processed in order to identify pauses and utterances, as well as artifacts. The "goodness" of the resulting segmentation can be tested through the plot option.

            Specificationlist:        SEGM
            ------------------------------
            A8 TANA              Undefined
            I4 NREC                      0  Default-value
            I4 NBIT                     16  Default-value
            I4 IBM                       0  Default-value
            I4 PROT                      0  Default-value
            I4 PLOT                      0  Default-value
            I4 PANF                      1  Default-value
            I4 PEND                     10  Default-value
            R4 AMPL                  1.000  Default-value
            I4 NGL                       0  Default-value
            I4 SCUT                  7.500  Default-value
            I4 LG30                     30  Default-value
            I4 LPRT                      6  Default-value
            I4 PROB                      0  Default-value
            I4 SAVE                      0  Default-value
 
            01 TAPE Name of input tape
            02 NREC Number of files to be processed
            03 NBIT Specifies number of bits used in A/D-conversion
            04 IBM  Specifies data format (IBM/SPARC vs. X86/X64)
            05 PROT Controls graphic output
            06 PLOT Controls graphic output
            07 PANF Specifies first page to be plotted
            08 PEND Specifies last page to be plotted
            09 AMPL Specifies amplification of signal prior to plotting
            10 NGL  Filter parameter for segmentation (smoothness)
            11 SCUT Filter parameter for segmentation (flanks)
            12 LG30 Logical unit number associated with input tape
            13 LPRT Logical unit number of plot-device
            14 PROB Specifies proband to be processed
            15 SAVE Saves newly constructed segmentation table
            16 DEMO Examples that illustrate program function
 
            - TANA:     Name of the input tape ("NO" means that
                        DSN-name will not be checked)
 
            - NREC = 0: All files
                   = n: First "n" files are to be processed
 
            - NBIT = n: Number of bits used in A/D-conversion (default=16)
 
            - IBM  = 0: Input data are stored in X86/X64-mode   (little endian)
                   = 1: Input data are stored in IBM/SPARC-mode (big endian)
 
            - PROT = 0: No protocol
                   = 1: Short protocol
                   = 2: Detailed protocol
 
            - PLOT < 0: Segmentation marks are written to unit "LPRT"
                   = 0: No plots
                   = 1: Time series
                   = 2: Time series with pre-existing segmentation marks
                   = 3: Envelope curves
                   = 4: Envelope curves with pre-existing segmentation marks
                   = 5: Envelope curves with newly computed segmentation marks
 
            - PANF = p: Specifies first page to be plotted
 
            - PEND = p: Specifies last page to be plotted
 
            - AMPL = r: Time series are amplified by factor "r" prior to
                        plotting
 
            - NGL  = 0: Filter parameter is automatically selected
                   = q: Filter parameter is explicitly specified
 
            - SCUT = 0: Filter parameter is automatically selected
                   = q: Filter parameter is explicitly specified
 
            - LG30 = u: Logical unit number associated with input tape
 
            - LPRT:     Logical unit number of plot-device (standard=6;
                        valid numbers are 46-96)
 
            - PROB = 0: All probands
                   = n: Only proband "n" is to be processed
 
            - SAVE = 0: No effect
                   > 0: Newly constructed segmentation table will be stored
 
 
            - DEMO: Segmentation of speech signals (marked by crosses)
 
           

Example

 
            &&START SEGM=Segmentation of re-formatted bli-tapes (stud600)
             TANA=NO,LG30=35,PLOT=5,PEND=7,LPRT=60,SCUT=5
 
           
vSpacer voxFig20
Fig. 20: Our segmentation algorithm determines, in a first step, the intensity of background noise. Once the noise level has been determined, the algorithm subdivides the speech signal into utterances and pauses using the noise level as threshold. Mean pause duration and utterance duration are used to test the hypothesis that patients speak more slowly during depression than they do after recovery.
Hspacer
[ Mail to Webmaster ] k454910@ifrg.ch
rFrame