Title: Cohesion in coherent technical texts

Author: Holger Schauer

Published as:ISBN 3-89825-730-4, Verlag Dissertation.de, Berlin, Germany

Automated understanding of texts requires more than just simple concatenation of information extracted from sentences. Complex relations between sentences like "Cause" or "Contrast" also need to be accounted for. A detailed analysis of a text corpus yielded a set of criteria that allow an automated recognition of such relations and their accompanying discourse structure. The motivation for this work was an integrated view on coherence as part of text understanding in general, aiming and focusing on a concrete implementation of an automated system.

In a first step cue phrases are examined, which are linguistic devices that hint at coherence relations. For instance, an occurrence of the conjunction "because" marks a causal connection. However, the analysis yielded a surprising observation: in the corpus used, only relatively few coherence relations can be attributed to cue phrases. This result demands the necessity of further criteria to examine.

One such criterion that is mainly relevant for the structure of a discurse is provided by referential links in a text, e.g., the usage of a pronoun like "it" that refers to an object mentioned earlier in the text. An empirical analysis shows that it is possible to specify an algorithm that integrates the analysis of referential links and discourse structure. On the basis of a successful resolution of referential expressions it is possible to determine between which textual units coherence relations may be established.

In general, it is unlikely that cue phrases or similar explicit marks will always be available. Based on a representation of the semantic content of a text, determining which coherence relation to derive is based on the satisfaction of constraints that are applied to the derived representation. This requires the availability of a sophisticated model of the domain of the text under consideration. For instance, knowledge about complex situations is often necessary. However, the computation of coherence relations makes not only use of the semantic content of a text but also of the explicit linguistic hints like cue phrases. The integration of such cohesive devices results in an approach that has a lower complexity than comparable knowledge-based accounts of coherence and which uses and influences the results of other understanding processes at the same time.


Holger Schauer
Zuletzt verändert am: Donnerstag, 15. Januar 2004, 16:02:11