Segmentation difficulties



The syntax of spoken dialogue  may seem fragmentary or disorderly for reasons other than dysfluency or unintelligibility. Some reasons are:

The canonical sentence of written language, as a structure containing a finite verb, is far from the being a satisfactory basis for the segmentation  of speech into independent syntactic wholes. According to one count by Leech [Ch. 14]Biber.etal:1999, ca. 39% of the independent syntactic units of conversational dialogue  have no finite verb: many are single-word utterances  typically consisting of a single interjection  in the extended sense of 1.5.1. The practice in the compilation of treebanks  has often been to use parse brackets (conventionally [S ...S]) to enclose the whole parsable unit, but to make no assumption that what occurs within those brackets should have the structure of a canonical sentence. Thus a stand-alone noun phrase unit, such as No problem, should be parsed simply [S [N No problem N] S]. The [S ...S] brackets may be interpreted as `sentence' or, say, as `(syntactic) segment', according to the annotator's or user's preference. For our present purpose, the term C-unitgif will be used for a segment parsed as an [S ...S] which is not part of another [S ...S].
The criteria for what counts as a C-unit in speech are difficult to determine, and may have to rely on prosodic separation  (for example the boundary of a major tone group  or intonation phrase ).
There are utterance  turns  in dialogue  where one speaker completes a syntactic construction begun by another speaker.

There appear to be four methods of segmenting a dialogue  into C-units: 

The C-unit should be delimited by criteria internal to syntax. That is, where no syntactic link can plausibly be established between one parsable unit and another, they are treated as independent. This solution, however, does not address point (ii) above.
The C-unit should be delimited by prosodic criteria, either alone, or in conjunction with syntactic criteria where these are clear. This solution, obviously, depends on the existence and quality of a prosodic level of annotation .
The C-unit should be delimited by orthographic criteria: that is, by treating sentence-final punctuation marks (specifically periods and question marks) as boundaries. This is the simplest method to apply, assuming that the orthographic transcription  is so punctuated. On the other hand, it is the most arbitrary, since punctuation marks are artefacts of the transcription, and do not have a warranted linguistic function.
The C-unit should be delimited by pragmatic, functional or discoursal criteria. Apart from the turn  boundary, which is no doubt the clearest delimiter one can use for parsing, pragmatic and discoursal criteria are probably no clearer in determining C-units than internal syntactic criteria. However, in the development of language engineering dialogue systems, considerable effort has been invested in the recognition of functionally-defined segments corresponding to dialogue acts . Moreover, in this context, the importance of syntactic annotation  is in facilitating the automatic recognition and delimitation of such functional units, rather than parsing as an end in itself. Hence there is much to be said for relying on functional criteria as the most valuable guide to segmentation  for purposes of dialogue  annotation .




Handbook of Multimodal and Spoken Dialogue Systems
Resources, Terminology and Product Evaluation